Challenging GPU Dominance: When CPUs Outperform for On-Device LLM Inference
Haolin Zhang, Jeff Huang

TL;DR
This paper demonstrates that, contrary to common belief, CPUs can outperform GPUs for on-device large language model inference under certain conditions, due to factors like memory transfer overhead and thread optimization.
Contribution
It provides empirical evidence and analysis showing CPU superiority over GPU for LLM inference on mobile devices, challenging the GPU-dominance paradigm.
Findings
CPU-only inference achieved higher tokens/sec than GPU on iPhone 15 Pro.
GPU memory transfer overhead significantly impacts inference speed.
Thread optimization and quantization strategies influence performance outcomes.
Abstract
The common assumption in on-device AI is that GPUs, with their superior parallel processing, always provide the best performance for large language model (LLM) inference. In this work, we challenge this notion by empirically demonstrating that, under certain conditions, CPUs can outperform GPUs for LLM inference on mobile devices. Using a 1-billion-parameter LLM deployed via llama.cpp on the iPhone 15 Pro, we show that a CPU-only configuration (two threads, F16 precision) achieves 17 tokens per second, surpassing the 12.8 tokens per second obtained with GPU acceleration. We analyze the architectural factors driving this counterintuitive result, revealing that GPU memory transfer overhead and CPU thread optimization play a critical role. Furthermore, we explore the impact of thread oversubscription, quantization strategies, and hardware constraints, providing new insights into efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Machine Learning in Materials Science · Big Data and Digital Economy
