Characterizing GPU Energy Usage in Exascale-Ready Portable Science Applications

William F. Godoy; Oscar Hernandez; Paul R. C. Kent; Maria Patrou; Kazi Asifuzzaman; Narasinga Rao Miniskar; Pedro Valero-Lara; Jeffrey S. Vetter; Matthew D. Sinclair; Jason Lowe-Power; Bobby R. Bruce

arXiv:2505.05623·cs.PF·November 27, 2025

Characterizing GPU Energy Usage in Exascale-Ready Portable Science Applications

William F. Godoy, Oscar Hernandez, Paul R. C. Kent, Maria Patrou, Kazi Asifuzzaman, Narasinga Rao Miniskar, Pedro Valero-Lara, Jeffrey S. Vetter, Matthew D. Sinclair, Jason Lowe-Power, Bobby R. Bruce

PDF

TL;DR

This paper analyzes GPU energy consumption in exascale-ready scientific applications, revealing significant energy savings with mixed-precision and highlighting tooling gaps, to inform future supercomputer design.

Contribution

It provides detailed characterization of GPU energy usage for two scientific applications across different hardware and explores application-specific metrics for energy-performance trade-offs.

Findings

01

Mixed-precision saves 6-25% energy on QMCPACK and 45% on AMReX-Castro.

02

Identifies gaps in AMD tooling on Frontier GPUs.

03

Query resolution variability is minimal between 1 ms and 1 s.

Abstract

We characterize the GPU energy usage of two widely adopted exascale-ready applications representing two classes of particle and mesh solvers: (i) QMCPACK, a quantum Monte Carlo package, and (ii) AMReXCastro, an adaptive mesh astrophysical code. We analyze power, temperature, utilization, and energy traces from double-/single (mixed)-precision benchmarks on NVIDIA's A100 and H100 and AMD's MI250X GPUs using queries in NVML and rocm_smi_lib, respectively. We explore application-specific metrics to provide insights on energy vs. performance trade-offs. Our results suggest that mixed-precision energy savings range between 6-25% on QMCPACK and 45% on AMReX-Castro. Also, we found gaps in the AMD tooling used on Frontier GPUs that need to be understood, while query resolutions on NVML have little variability between 1 ms-1 s. Overall, application level knowledge is crucial to define…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.