TokenPowerBench: Benchmarking the Power Consumption of LLM Inference
Chenxu Niu, Wei Zhang, Jie Li, Yongjian Zhao, Tongyang Wang, Xi Wang, Yong Chen

TL;DR
TokenPowerBench is a novel, lightweight benchmarking tool that enables detailed power consumption analysis of large language model inference, facilitating energy efficiency improvements and sustainability efforts.
Contribution
It introduces the first extensible benchmark for measuring LLM inference power consumption, with a comprehensive measurement layer and configurable parameters.
Findings
Effective power measurement across multiple models and sizes.
Insights into how batch size, context length, and quantization affect energy use.
Open source release for community adoption and benchmarking.
Abstract
Large language model (LLM) services now answer billions of queries per day, and industry reports show that inference, not training, accounts for more than 90% of total power consumption. However, existing benchmarks focus on either training/fine-tuning or performance of inference and provide little support for power consumption measurement and analysis of inference. We introduce TokenPowerBench, the first lightweight and extensible benchmark designed for LLM-inference power consumption studies. The benchmark combines (i) a declarative configuration interface covering model choice, prompt set, and inference engine, (ii) a measurement layer that captures GPU-, node-, and system-level power without specialized power meters, and (iii) a phase-aligned metrics pipeline that attributes energy to the prefill and decode stages of every request. These elements make it straight-forward to explore…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsBig Data and Digital Economy · Green IT and Sustainability · Parallel Computing and Optimization Techniques
