Mobile Edge Intelligence for Large Language Models: A Contemporary   Survey

Guanqiao Qu; Qiyuan Chen; Wei Wei; Zheng Lin; Xianhao Chen; Kaibin; Huang

arXiv:2407.18921·cs.NI·March 21, 2025·1 cites

Mobile Edge Intelligence for Large Language Models: A Contemporary Survey

Guanqiao Qu, Qiyuan Chen, Wei Wei, Zheng Lin, Xianhao Chen, Kaibin, Huang

PDF

Open Access

TL;DR

This survey explores how mobile edge intelligence can enable efficient, privacy-preserving deployment of large language models on edge devices by offloading computation to nearby edge servers, addressing resource constraints.

Contribution

It provides a comprehensive overview of MEI for LLMs, including architecture, techniques, and future research directions, which is a novel synthesis in this emerging field.

Findings

01

Identifies key applications requiring edge LLM deployment.

02

Summarizes resource-efficient techniques for on-device LLMs.

03

Outlines architecture supporting edge LLM caching, training, and inference.

Abstract

On-device large language models (LLMs), referring to running LLMs on edge devices, have raised considerable interest since they are more cost-effective, latency-efficient, and privacy-preserving compared with the cloud paradigm. Nonetheless, the performance of on-device LLMs is intrinsically constrained by resource limitations on edge devices. Sitting between cloud and on-device AI, mobile edge intelligence (MEI) presents a viable solution by provisioning AI capabilities at the edge of mobile networks, enabling end users to offload heavy AI computation to capable edge servers nearby. This article provides a contemporary survey on harnessing MEI for LLMs. We begin by illustrating several killer applications to demonstrate the urgent need for deploying LLMs at the network edge. Next, we present the preliminaries of LLMs and MEI, followed by resource-efficient LLM techniques. We then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRecommender Systems and Techniques

MethodsMulti-partition Embedding Interaction