SweetSpot: An Analytical Model for Predicting Energy Efficiency of LLM Inference
Hiari Pizzini Cavagna, Andrea Proia, Giacomo Madella, Giovanni B. Esposito, Francesco Antici, Daniele Cesarini, Zeynep Kiziltan, Andrea Bartolini

TL;DR
SweetSpot is an analytical model that predicts the energy efficiency of LLM inference based on input and output sequence lengths, revealing optimal points for energy savings.
Contribution
The paper introduces SweetSpot, a novel non-linear model derived from Transformer architecture complexity to accurately predict LLM inference energy consumption.
Findings
Energy efficiency peaks at short-to-moderate inputs and medium outputs.
The model achieves a mean MAPE of 1.79% across diverse LLMs.
Aligning sequence lengths with SweetSpot reduces energy use by up to 33.41x.
Abstract
Large Language Models (LLMs) inference is central to modern AI applications, dominating worldwide datacenter workloads, making it critical to predict its energy footprint. Existing approaches estimate energy consumption as a simple linear function of input and output sequence. However, by analyzing the autoregressive structure of Transformers, which implies a fundamentally non-linear relationship between input and output sequence lengths and energy consumption, we demonstrate the existence of a generation energy minima. Peak efficiency occurs with short-to-moderate inputs and medium-length outputs, while efficiency drops sharply for long inputs or very short outputs. Consequently, we propose SweetSpot, an analytical model derived from the computational and memory-access complexity of the Transformer architecture, which accurately characterizes the efficiency curve as a function of input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
