Understanding Large Language Models in Your Pockets: Performance Study on COTS Mobile Devices
Jie Xiao, Qianyi Huang, Xu Chen, Chen Tian

TL;DR
This paper provides a comprehensive measurement study of lightweight large language models running on commercial mobile devices, analyzing their performance, resource use, and system differences to guide future development.
Contribution
It offers the first detailed evaluation of LLM performance on real mobile hardware, comparing user experience and system factors across major mobile SoCs.
Findings
Significant performance variation across different mobile SoCs.
Resource utilization and battery consumption are critical factors for mobile LLM deployment.
Insights into bottlenecks and system behaviors to inform future mobile LLM design.
Abstract
As large language models (LLMs) increasingly integrate into every aspect of our work and daily lives, there are growing concerns about user privacy, which push the trend toward local deployment of these models. There are a number of lightweight LLMs (e.g., Gemini Nano, LLAMA2 7B) that can run locally on smartphones, providing users with greater control over their personal data. As a rapidly emerging application, we are concerned about their performance on commercial-off-the-shelf mobile devices. To fully understand the current landscape of LLM deployment on mobile platforms, we conduct a comprehensive measurement study on mobile devices. While user experience is the primary concern for end-users, developers focus more on the underlying implementations. Therefore, we evaluate both user-centric metrics-such as token throughput, latency, and response quality-and developer-critical factors,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
