Empirical Guidelines for Deploying LLMs onto Resource-constrained Edge Devices
Ruiyang Qin, Dancheng Liu, Chenhui Xu, Zheyu Yan, Zhaoxuan Tan, Zhenge, Jia, Amir Nassereldine, Jiajie Li, Meng Jiang, Ahmed Abbasi, Jinjun Xiong,, Yiyu Shi

TL;DR
This paper empirically investigates how to effectively deploy large language models on resource-limited edge devices, providing practical guidelines based on extensive experiments for optimizing customization and inference.
Contribution
It offers novel empirical guidelines for deploying LLMs on resource-constrained devices, considering various design tradeoffs and their impacts on learning efficiency and accuracy.
Findings
Optimal parameter learning vs. RAG depends on task difficulty
Longer fine-tuning time may not improve performance
Compressed LLMs can outperform uncompressed ones with limited data
Abstract
The scaling laws have become the de facto guidelines for designing large language models (LLMs), but they were studied under the assumption of unlimited computing resources for both training and inference. As LLMs are increasingly used as personalized intelligent assistants, their customization (i.e., learning through fine-tuning) and deployment onto resource-constrained edge devices will become more and more prevalent. An urging but open question is how a resource-constrained computing environment would affect the design choices for a personalized LLM. We study this problem empirically in this work. In particular, we consider the tradeoffs among a number of key design factors and their intertwined impacts on learning efficiency and accuracy. The factors include the learning methods for LLM customization, the amount of personalized data used for learning customization, the types and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Data Stream Mining Techniques · Semantic Web and Ontologies
MethodsAttention Is All You Need · WordPiece · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay · Softmax · Layer Normalization · BERT · Linear Layer · Byte Pair Encoding
