Empirical Measurements of AI Training Power Demand on a GPU-Accelerated Node
Imran Latif, Alex C. Newkirk, Matthew R. Carbone, Arslan Munir, Yuewei, Lin, Jonathan Koomey, Xi Yu, Zhiuha Dong

TL;DR
This paper empirically measures the power consumption of an 8-GPU NVIDIA H100 node during AI training, revealing insights into energy use and efficiency improvements for data center planning.
Contribution
It provides the first detailed empirical power measurements of an NVIDIA H100 GPU node during AI training, highlighting energy consumption patterns and efficiency gains with batch size adjustments.
Findings
Maximum power draw was approximately 8.4 kW, below the rated 10.2 kW.
Increasing batch size from 512 to 4096 reduces energy consumption by a factor of 4.
Power consumption patterns inform data center capacity planning and energy estimates.
Abstract
The expansion of artificial intelligence (AI) applications has driven substantial investment in computational infrastructure, especially by cloud computing providers. Quantifying the energy footprint of this infrastructure requires models parameterized by the power demand of AI hardware during training. We empirically measured the instantaneous power draw of an 8-GPU NVIDIA H100 HGX node during the training of open-source image classifier (ResNet) and large-language models (Llama2-13b). The maximum observed power draw was approximately 8.4 kW, 18% lower than the manufacturer-rated 10.2 kW, even with GPUs near full utilization. Holding model architecture constant, increasing batch size from 512 to 4096 images for ResNet reduced total training energy consumption by a factor of 4. These findings can inform capacity planning for data center operators and energy use estimates by researchers.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAge of Information Optimization · IoT and Edge/Fog Computing
