MELTing point: Mobile Evaluation of Language Transformers

Stefanos Laskaridis; Kleomenis Katevas; Lorenzo Minto; Hamed Haddadi

arXiv:2403.12844·cs.LG·July 29, 2024·2 cites

MELTing point: Mobile Evaluation of Language Transformers

Stefanos Laskaridis, Kleomenis Katevas, Lorenzo Minto, Hamed Haddadi

PDF

Open Access 1 Repo

TL;DR

This paper systematically evaluates the performance, energy, and accuracy of large language models running on mobile devices, highlighting current limitations and potential improvements for on-device AI deployment.

Contribution

It introduces MELT, an automation infrastructure for benchmarking LLMs on mobile devices, providing the first comprehensive analysis of on-device LLM execution across various models and hardware.

Findings

01

LLM inference is largely memory-bound.

02

Quantization reduces memory but impacts accuracy.

03

Energy and thermal constraints hinder continuous on-device execution.

Abstract

Transformers have revolutionized the machine learning landscape, gradually making their way into everyday tasks and equipping our computers with "sparks of intelligence". However, their runtime requirements have prevented them from being broadly deployed on mobile. As personal devices become increasingly powerful and prompt privacy becomes an ever more pressing issue, we explore the current state of mobile execution of Large Language Models (LLMs). To achieve this, we have created our own automation infrastructure, MELT, which supports the headless execution and benchmarking of LLMs on device, supporting different models, devices and frameworks, including Android, iOS and Nvidia Jetson devices. We evaluate popular instruction fine-tuned LLMs and leverage different frameworks to measure their end-to-end and granular performance, tracing their memory and energy requirements along the way.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

brave-experiments/melt-public
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Natural Language Processing Techniques