Exploring the Frontiers of Energy Efficiency using Power Management at System Scale
Ahmad Maroof Karimi, Matthias Maiterth, Woong Shin, Naw Safrin Sattar,, Hao Lu, Feiyi Wang

TL;DR
This paper presents a telemetry data-driven methodology to estimate energy savings from software-driven power management techniques like DVFS and Power Capping in exascale HPC systems, demonstrated on the Frontier supercomputer.
Contribution
It introduces a novel approach for establishing upper energy savings limits and applies it at scale to quantify potential benefits for HPC systems.
Findings
Up to 8.5% energy savings for certain jobs
Equivalent to 1438 MWh of energy saved
Methodology applicable to large-scale supercomputers
Abstract
In the face of surging power demands for exascale HPC systems, this work tackles the critical challenge of understanding the impact of software-driven power management techniques like Dynamic Voltage and Frequency Scaling (DVFS) and Power Capping. These techniques have been actively developed over the past few decades. By combining insights from GPU benchmarking to understand application power profiles, we present a telemetry data-driven approach for deriving energy savings projections. This approach has been demonstrably applied to the Frontier supercomputer at scale. Our findings based on three months of telemetry data indicate that, for certain resource-constrained jobs, significant energy savings (up to 8.5%) can be achieved without compromising performance. This translates to a substantial cost reduction, equivalent to 1438 MWh of energy saved. The key contribution of this work…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPower Systems and Renewable Energy
