Empirically-Calibrated H100 Node Power Models for Reducing Uncertainty in AI Training Energy Estimation
Alex C. Newkirk, Jared Fernandez, Jonathan Koomey, Imran Latif, Emma Strubell, Arman Shehabi, Constantine Samaras

TL;DR
This paper develops empirically-calibrated power models for H100 GPU nodes during AI training, reducing uncertainty in energy estimation and revealing architecture-specific power signatures that impact grid stability.
Contribution
It introduces statistical models calibrated with empirical data to accurately predict AI training energy consumption, outperforming traditional TDP-based estimates.
Findings
Transformers exhibit distinct power fluctuation patterns.
Models achieve 11.4% prediction error, better than TDP-based estimates.
Even intensive workloads operate at 76% of TDP rating.
Abstract
As AI's energy demand continues to grow, it is critical to enhance the understanding of characteristics of this demand, to improve grid infrastructure planning and environmental assessment. By combining empirical measurements from Brookhaven National Laboratory during AI training on 8-GPU H100 systems with open-source benchmarking data, we develop statistical models relating computational intensity to node-level power consumption. We measure the gap between manufacturer-rated thermal design power (TDP) and actual power demand during AI training. Our analysis reveals that even computationally intensive workloads operate at only 76% of the 10.2 kW TDP rating. Our architecture-specific model, calibrated to floating-point operations, predicts energy consumption with 11.4% mean absolute percentage error, significantly outperforming TDP-based approaches (27-37% error). We identified distinct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware System Performance and Reliability · Software Reliability and Analysis Research
