Where Do the Joules Go? Diagnosing Inference Energy Consumption
Jae-Won Chung, Ruofan Wu, Jeff J. Ma, Mosharaf Chowdhury

TL;DR
This paper conducts a large-scale empirical study on inference energy consumption across diverse AI models and tasks, revealing key factors influencing energy use and proposing a framework to diagnose underlying causes.
Contribution
It provides the first comprehensive measurement of inference energy across models and tasks, and introduces a framework to understand and diagnose energy consumption mechanisms.
Findings
Energy varies up to 25× across tasks and models.
Video generation can use over 100× more energy than image tasks.
GPU utilization impacts energy consumption by 3–5×.
Abstract
Energy is now a critical ML computing resource. While measuring energy consumption and observing trends is a valuable first step, accurately understanding and diagnosing why those differences occur is crucial for optimization. To that end, we begin by presenting a large-scale measurement study of inference time and energy across the generative AI landscape with 46 models, 7 tasks, and 1,858 different configurations on NVIDIA H100 and B200 GPUs. Our empirical findings span order-of-magnitude variations: LLM task type can lead to 25 energy differences, video generation sometimes consumes more than 100 the energy of images, and GPU utilization differences can result in 3--5 energy differences. Based on our observations, we present a framework for reasoning about the underlying mechanisms that govern time and energy consumption. The essence is that time and energy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGreen IT and Sustainability · Big Data and Digital Economy · Parallel Computing and Optimization Techniques
