LIME:Accelerating Collaborative Lossless LLM Inference on Memory-Constrained Edge Devices
Mingyu Sun, Xiao Zhang, Shen Qu, Yan Li, Mengbai Xiao, Yuan Yuan, Dongxiao Yu

TL;DR
LIME is a system that enables lossless inference of large language models across multiple memory-limited edge devices by balancing computation and communication, significantly speeding up inference without accuracy loss.
Contribution
LIME introduces a collaborative, lossless inference framework with dynamic resource management and interleaved parallelism for efficient large model deployment on edge devices.
Findings
Achieves 1.7x speedup over baselines in sporadic requests
Achieves 3.7x speedup in bursty request scenarios
Maintains model accuracy while improving inference speed
Abstract
Large language models (LLMs) have emerged as a powerful foundation for intelligent reasoning and decision-making, demonstrating substantial impact across a wide range of domains and applications. However, their massive parameter scales and substantial resource demands pose critical challenges for efficient inference on edge devices. These devices are inherently constrained by limited computational power and memory capacity, while bandwidth bottlenecks at the network edge further restrict distributed deployment and real-time responsiveness. Although existing research has explored lightweight optimization techniques to mitigate memory limitations, such approaches often incur significant degradation in model accuracy and performance. To address these challenges, we propose LIME, a collaborative system that enables lossless inference for large models across multiple memory-constrained edge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Digital Economy · Advanced Neural Network Applications · Topic Modeling
