From Prompts to Power: Measuring the Energy Footprint of LLM Inference
Francisco Caravaca, \'Angel Cuevas, Rub\'en Cuevas

TL;DR
This paper systematically measures and models the energy consumption of large language model inference across diverse architectures and hardware, highlighting environmental impacts and providing tools for awareness.
Contribution
It presents the first large-scale measurement study of LLM inference energy use, developing a predictive model and a browser extension to promote sustainable AI practices.
Findings
Over 32,500 measurements across 21 GPU configs and 155 models.
Energy demand varies significantly with architecture and operational factors.
The predictive model accurately estimates energy consumption for unseen models.
Abstract
The rapid expansion of Large Language Models (LLMs) has introduced unprecedented energy demands, extending beyond training to large-scale inference workloads that often dominate total lifecycle consumption. Deploying these models requires energy-intensive GPU infrastructure, and in some cases has even prompted plans to power data centers with nuclear energy. Despite this growing relevance, systematic analyses of inference energy consumption remain limited. In this work, we present a large-scale measurement-based study comprising over 32,500 measurements across 21 GPU configurations and 155 model architectures, from small open-source models to frontier systems. Using the vLLM inference engine, we quantify energy usage at the prompt level and identify how architectural and operational factors shape energy demand. Building on these insights, we develop a predictive model that accurately…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGreen IT and Sustainability · Big Data and Digital Economy · Advanced Neural Network Applications
