Watt Counts: Energy-Aware Benchmark for Sustainable LLM Inference on Heterogeneous GPU Architectures

Mauricio Fadel Argerich; Jonathan F\"urst; Marta Pati\~no-Mart\'inez

arXiv:2604.09048·cs.DC·April 13, 2026

Watt Counts: Energy-Aware Benchmark for Sustainable LLM Inference on Heterogeneous GPU Architectures

Mauricio Fadel Argerich, Jonathan F\"urst, Marta Pati\~no-Mart\'inez

PDF

TL;DR

This paper introduces Watt Counts, a comprehensive energy consumption dataset and benchmark for LLM inference on diverse GPU architectures, enabling energy-efficient deployment strategies.

Contribution

It provides the largest open-access energy dataset for LLMs and a reproducible benchmark to guide energy-aware hardware selection and deployment.

Findings

01

GPU choice significantly impacts energy efficiency for LLM inference.

02

Optimal hardware varies across models and scenarios.

03

Practitioners can reduce energy use by up to 70% with minimal performance loss.

Abstract

While the large energy consumption of Large Language Models (LLMs) is recognized by the community, system operators lack guidance for energy-efficient LLM inference deployments that leverage energy trade-offs of heterogeneous hardware due to a lack of energy-aware benchmarks and data. In this work we address this gap with Watt Counts: the largest open-access dataset of energy consumption of LLMs, with over 5,000 experiments for 50 LLMs across 10 NVIDIA Graphics Processing Units (GPUs) in batch and server scenarios along with a reproducible, open-source benchmark that enables community submissions to expand this dataset. Leveraging this dataset, we conduct a system-level study of LLM inference across heterogeneous GPU architectures and show that GPU selection is crucial for energy efficiency outcomes and that optimal hardware choices vary significantly across models and deployment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.