Beyond Test-Time Compute Strategies: Advocating Energy-per-Token in LLM Inference

Patrick Wilhelm; Thorsten Wittkopp; Odej Kao

arXiv:2603.20224·cs.CL·March 24, 2026·EuroMLSys

Beyond Test-Time Compute Strategies: Advocating Energy-per-Token in LLM Inference

Patrick Wilhelm, Thorsten Wittkopp, Odej Kao

PDF

Open Access

TL;DR

This paper investigates the energy-accuracy trade-offs in large and small language models during inference, proposing energy-per-token as a new metric and dynamic reasoning control to promote sustainable AI practices.

Contribution

It introduces energy-per-token as a novel metric and advocates for energy-aware inference strategies, including dynamic reasoning control, to optimize sustainability in LLM deployment.

Findings

01

Energy-per-Token effectively measures inference energy efficiency.

02

Dynamic reasoning control balances accuracy and energy consumption.

03

Transformer input-output token dynamics influence energy consumption patterns.

Abstract

Large Language Models (LLMs) demonstrate exceptional performance across diverse tasks but come with substantial energy and computational costs, particularly in request-heavy scenarios. In many real-world applications, the full scale and capabilities of LLMs are often unnecessary, as Small Language Models (SLMs) can provide accurate responses for simpler text generation tasks. When enhanced with advanced reasoning strategies, such as Chain-of-Thought (CoT) prompting or Majority Voting, SLMs can approach the performance of larger models while reducing overall computational requirements. However, these strategies can also introduce additional energy costs, creating an energy-accuracy trade-off. Our analysis examines these trade-offs in test-time compute strategies for smaller models compared to larger ones, using the MMLU benchmark. Additionally, we explore the input-output token dynamics…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning in Materials Science · Big Data and Digital Economy