Network Edge Inference for Large Language Models: Principles, Techniques, and Opportunities
Zhixiong Chen, Bingjie Zhu, Jiangzhou Wang, Hyundong Shin, Arumugam Nallanathan, and Dusit Niyato

TL;DR
This survey reviews recent advances in enabling large language models to operate efficiently at network edge devices, addressing challenges in system design, optimization, and resource management.
Contribution
It provides a comprehensive overview of techniques and future directions for deploying LLMs in resource-constrained edge environments.
Findings
Summarizes recent system architectures and optimization techniques for edge LLM inference.
Identifies key challenges and opportunities in resource management and scheduling.
Maps future research directions to unlock LLM potential at the network edge.
Abstract
Large language models (LLMs) have advanced rapidly, emerging as versatile tools across fields thanks to their exceptional language understanding, generation, and reasoning capabilities. However, performing LLM inference at the network edge remains challenging due to their large memory and compute demands. This survey outlines the challenges specific to LLM edge inference and provides a comprehensive overview of recent progress, covering system architectures, model optimization and deployment, and resource management and scheduling. By synthesizing state-of-the-art techniques and mapping future directions, this survey aims to unlock the potential of LLMs in resource-constrained edge environments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
