TL;DR
COUNTDOWN is a runtime library that reduces energy consumption during communication idle times in MPI applications without significantly impacting execution time, tested on real HPC systems.
Contribution
It introduces a methodology and tool for transparent, performance-neutral energy savings in MPI applications during communication idle periods.
Findings
Saves 6-50% energy on NAS benchmarks with <5% time increase.
Achieves 22.36% energy savings on Quantum ESPRESSO with <3% performance penalty.
Energy savings increase to 37% with higher performance penalty when communication tuning is disabled.
Abstract
Power and energy consumption is becoming key challenges to deploy the first exascale supercomputer successfully. Large-scale HPC applications waste a significant amount of power in communication and synchronization-related idle times. However, due to the time scale at which communication happens, transitioning in low power states during communication's idle times may introduce unacceptable overhead in applications' execution time. In this paper, we present COUNTDOWN, a runtime library, supported by a methodology and analysis tool for identifying and automatically reducing the power consumption of the computing elements during communication and synchronization. COUNTDOWN saves energy without imposing significant time-to-completion increase by lowering CPUs power consumption only during idle times for which power state transition overhead are negligible. This is done transparently to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
