Architectural Trade-offs in the Energy-Efficient Era: A Comparative Study of power-capping NVIDIA H100 and H200
Aditya Ujeniya, Jan Eitzinger, Georg Hager, Gerhard Wellein

TL;DR
This study compares the energy efficiency of NVIDIA H100 and H200 GPUs, focusing on how memory bandwidth differences influence power management and performance for compute- and memory-bound workloads.
Contribution
It provides a detailed analysis of memory power management and efficiency trade-offs between H100 and H200 architectures under various power caps.
Findings
H100 is more efficient for compute-bound workloads.
H200 outperforms H100 in memory-bound applications.
Memory power consumption varies significantly with power caps.
Abstract
Modern NVIDIA GPUs like the H100 (HBM2e) and H200 (HBM3e) share similar compute characteristics but differ significantly in memory interface technology and bandwidth. By isolating memory bandwidth as a key variable, the power distribution between the memory and Streaming Multiprocessors (SM) changes notably between the two architectures. In the era of energy-efficient computing, analyzing how these hardware characteristics impact performance per watt is critical. This study investigates how the H100 and H200 manage memory power consumption at various power-cap levels. By a regression analysis, we study the memory power limit and uncover outliers consuming more memory power. To evaluate efficiency, we employ compute-bound (DGEMM) and memory-bound (TheBandwidthBenchmark) workloads, representing the two extremes of the Roof\-line model. Our observations indicate that across varying power…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
