DisCEdge: Distributed Context Management for Large Language Models at the Edge
Mohammadreza Malekabbasi, Minghe Wang, David Bermbach

TL;DR
DisCEdge is a system for managing user context across distributed edge nodes for large language models, improving response times, reducing synchronization overhead, and minimizing client request sizes.
Contribution
It introduces a tokenized, distributed context management approach that maintains data consistency and enhances efficiency in edge deployments of LLMs.
Findings
Median response times improved by up to 14.46%.
Inter-node synchronization overhead reduced by up to 15%.
Client request sizes decreased by a median of 90%.
Abstract
Deploying Large Language Model (LLM) services at the edge benefits latency-sensitive and privacy-aware applications. However, the stateless nature of LLMs makes managing user context (e.g., sessions, preferences) across geo-distributed edge nodes challenging. Existing solutions, such as client-side context storage, introduce network latency and bandwidth overhead, undermining edge deployment advantages. We propose DisCEdge, a distributed context management system that stores and replicates user context in tokenized form across edge nodes. By maintaining context as token sequences, our system avoids redundant computation and enables efficient data replication. We evaluate an open-source prototype in a realistic edge environment. DisCEdge improves median response times by up to 14.46% and lowers median inter-node synchronization overhead by up to 15% compared to a raw-text-based system.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
