Mobile Edge Intelligence for Large Language Models: A Contemporary Survey
Guanqiao Qu, Qiyuan Chen, Wei Wei, Zheng Lin, Xianhao Chen, Kaibin, Huang

TL;DR
This survey explores how mobile edge intelligence can enable efficient, privacy-preserving deployment of large language models on edge devices by offloading computation to nearby edge servers, addressing resource constraints.
Contribution
It provides a comprehensive overview of MEI for LLMs, including architecture, techniques, and future research directions, which is a novel synthesis in this emerging field.
Findings
Identifies key applications requiring edge LLM deployment.
Summarizes resource-efficient techniques for on-device LLMs.
Outlines architecture supporting edge LLM caching, training, and inference.
Abstract
On-device large language models (LLMs), referring to running LLMs on edge devices, have raised considerable interest since they are more cost-effective, latency-efficient, and privacy-preserving compared with the cloud paradigm. Nonetheless, the performance of on-device LLMs is intrinsically constrained by resource limitations on edge devices. Sitting between cloud and on-device AI, mobile edge intelligence (MEI) presents a viable solution by provisioning AI capabilities at the edge of mobile networks, enabling end users to offload heavy AI computation to capable edge servers nearby. This article provides a contemporary survey on harnessing MEI for LLMs. We begin by illustrating several killer applications to demonstrate the urgent need for deploying LLMs at the network edge. Next, we present the preliminaries of LLMs and MEI, followed by resource-efficient LLM techniques. We then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRecommender Systems and Techniques
MethodsMulti-partition Embedding Interaction
