Watt Counts: Energy-Aware Benchmark for Sustainable LLM Inference on Heterogeneous GPU Architectures
Mauricio Fadel Argerich, Jonathan F\"urst, Marta Pati\~no-Mart\'inez

TL;DR
This paper introduces Watt Counts, a comprehensive energy consumption dataset and benchmark for LLM inference on diverse GPU architectures, enabling energy-efficient deployment strategies.
Contribution
It provides the largest open-access energy dataset for LLMs and a reproducible benchmark to guide energy-aware hardware selection and deployment.
Findings
GPU choice significantly impacts energy efficiency for LLM inference.
Optimal hardware varies across models and scenarios.
Practitioners can reduce energy use by up to 70% with minimal performance loss.
Abstract
While the large energy consumption of Large Language Models (LLMs) is recognized by the community, system operators lack guidance for energy-efficient LLM inference deployments that leverage energy trade-offs of heterogeneous hardware due to a lack of energy-aware benchmarks and data. In this work we address this gap with Watt Counts: the largest open-access dataset of energy consumption of LLMs, with over 5,000 experiments for 50 LLMs across 10 NVIDIA Graphics Processing Units (GPUs) in batch and server scenarios along with a reproducible, open-source benchmark that enables community submissions to expand this dataset. Leveraging this dataset, we conduct a system-level study of LLM inference across heterogeneous GPU architectures and show that GPU selection is crucial for energy efficiency outcomes and that optimal hardware choices vary significantly across models and deployment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
