On-Device Language Models: A Comprehensive Review
Jiajun Xu, Zhiyuan Li, Wei Chen, Qun Wang, Xin Gao, Qi Cai, Ziyuan, Ling

TL;DR
This paper reviews the development, challenges, and solutions for deploying large language models on resource-constrained edge devices, emphasizing efficient architectures, compression techniques, hardware acceleration, and real-world applications.
Contribution
It provides a comprehensive overview of current methods, challenges, and future research directions for on-device language models, integrating technical solutions and practical case studies.
Findings
Efficient architectures like parameter sharing improve on-device LLM performance.
Compression techniques such as quantization and pruning reduce model size significantly.
Case studies demonstrate successful deployment of on-device LLMs in real-world applications.
Abstract
The advent of large language models (LLMs) revolutionized natural language processing applications, and running LLMs on edge devices has become increasingly attractive for reasons including reduced latency, data localization, and personalized user experiences. This comprehensive review examines the challenges of deploying computationally expensive LLMs on resource-constrained devices and explores innovative solutions across multiple domains. The paper investigates the development of on-device language models, their efficient architectures, including parameter sharing and modular designs, as well as state-of-the-art compression techniques like quantization, pruning, and knowledge distillation. Hardware acceleration strategies and collaborative edge-cloud deployment approaches are analyzed, highlighting the intricate balance between performance and resource utilization. Case studies of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIoT and Edge/Fog Computing · Smart Cities and Technologies · Context-Aware Activity Recognition Systems
