Dissecting the Impact of Mobile DVFS Governors on LLM Inference Performance and Energy Efficiency

Zongpu Zhang; Pranab Dash; Y. Charlie Hu; Qiang Xu; Jian Li; Haibing Guan

arXiv:2507.02135·cs.OS·July 4, 2025

Dissecting the Impact of Mobile DVFS Governors on LLM Inference Performance and Energy Efficiency

Zongpu Zhang, Pranab Dash, Y. Charlie Hu, Qiang Xu, Jian Li, Haibing Guan

PDF

TL;DR

This paper investigates how independent DVFS governors for CPU, GPU, and memory in mobile devices affect LLM inference efficiency, and proposes FUSE, a unified governor that improves latency without increasing energy consumption.

Contribution

It provides a detailed measurement of the impact of separate governors on LLM inference and introduces FUSE, a novel unified energy-aware governor for mobile LLM deployment.

Findings

01

Triplet mobile governors cause up to 40.4% longer latency compared to optimal frequency combinations.

02

FUSE reduces inference latency by up to 36.8% while maintaining the same energy per token.

03

Unified governor improves energy efficiency and inference speed for mobile LLMs.

Abstract

Large Language Models (LLMs) are increasingly being integrated into various applications and services running on billions of mobile devices. However, deploying LLMs on resource-limited mobile devices faces a significant challenge due to their high demand for computation, memory, and ultimately energy. While current LLM frameworks for mobile use three power-hungry components-CPU, GPU, and Memory-even when running primarily-GPU LLM models, optimized DVFS governors for CPU, GPU, and memory featured in modern mobile devices operate independently and are oblivious of each other. Motivated by the above observation, in this work, we first measure the energy-efficiency of a SOTA LLM framework consisting of various LLM models on mobile phones which showed the triplet mobile governors result in up to 40.4% longer prefilling and decoding latency compared to optimal combinations of CPU, GPU, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.