MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection

Zhengxiang Huang; Chaoyue Niu; Zhaode Wang; Jiarui Xue; Hanming Zhang; Yugang Wang; Zewei Xin; Xiaotang Jiang; Chengfei Lv; Fan Wu; Guihai Chen

arXiv:2506.19884·cs.OS·June 26, 2025

MNN-AECS: Energy Optimization for LLM Decoding on Mobile Devices via Adaptive Core Selection

Zhengxiang Huang, Chaoyue Niu, Zhaode Wang, Jiarui Xue, Hanming Zhang, Yugang Wang, Zewei Xin, Xiaotang Jiang, Chengfei Lv, Fan Wu, Guihai Chen

PDF

TL;DR

This paper presents MNN-AECS, an energy-efficient system for on-device LLM decoding that dynamically selects CPU cores to reduce energy consumption without significantly impacting decoding speed.

Contribution

It introduces AECS, a novel adaptive core selection method integrated into MNN, enabling energy-efficient LLM decoding on mobile devices without root or OS modifications.

Findings

01

Reduces energy use by 23% on average across devices.

02

Achieves 39% to 78% energy savings compared to other engines.

03

Maintains acceptable decoding speed with minimal slowdown.

Abstract

As the demand for on-device Large Language Model (LLM) inference grows, energy efficiency has become a major concern, especially for battery-limited mobile devices. Our analysis shows that the memory-bound LLM decode phase dominates energy use, and yet most existing works focus on accelerating the prefill phase, neglecting energy concerns. We introduce Adaptive Energy-Centric Core Selection (AECS) and integrate it into MNN to create the energy-efficient version, MNN-AECS, the first engine-level system solution without requiring root access or OS modifications for energy-efficient LLM decoding. MNN-AECS is designed to reduce LLM decoding energy while keeping decode speed within an acceptable slowdown threshold by dynamically selecting low-power CPU cores. MNN-AECS is evaluated across 5 Android and 2 iOS devices on 5 popular LLMs of various sizes. Compared to original MNN, MNN-AECS cuts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Focus