MELTing point: Mobile Evaluation of Language Transformers
Stefanos Laskaridis, Kleomenis Katevas, Lorenzo Minto, Hamed Haddadi

TL;DR
This paper systematically evaluates the performance, energy, and accuracy of large language models running on mobile devices, highlighting current limitations and potential improvements for on-device AI deployment.
Contribution
It introduces MELT, an automation infrastructure for benchmarking LLMs on mobile devices, providing the first comprehensive analysis of on-device LLM execution across various models and hardware.
Findings
LLM inference is largely memory-bound.
Quantization reduces memory but impacts accuracy.
Energy and thermal constraints hinder continuous on-device execution.
Abstract
Transformers have revolutionized the machine learning landscape, gradually making their way into everyday tasks and equipping our computers with "sparks of intelligence". However, their runtime requirements have prevented them from being broadly deployed on mobile. As personal devices become increasingly powerful and prompt privacy becomes an ever more pressing issue, we explore the current state of mobile execution of Large Language Models (LLMs). To achieve this, we have created our own automation infrastructure, MELT, which supports the headless execution and benchmarking of LLMs on device, supporting different models, devices and frameworks, including Android, iOS and Nvidia Jetson devices. We evaluate popular instruction fine-tuned LLMs and leverage different frameworks to measure their end-to-end and granular performance, tracing their memory and energy requirements along the way.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques
