COUNTDOWN Slack: a Run-time Library to Reduce Energy Footprint in Large-scale MPI Applications
Daniele Cesarini, Andrea Bartolini, Andrea Borghesi, Carlo Cavazzoni,, Mathieu Luisier, Luca Benini

TL;DR
COUNTDOWN Slack is a run-time library that reduces energy consumption in large-scale MPI applications by exploiting communication slack without altering application code, achieving significant energy savings with minimal performance overhead.
Contribution
It introduces a novel approach separating communication phases and using a timeout algorithm to enable performance-neutral energy savings in MPI applications.
Findings
Achieves an average of 10% energy savings in scientific applications.
Reduces energy consumption by up to 22% in large-scale runs.
Maintains overhead below 1% in typical scenarios.
Abstract
The power consumption of supercomputers is a major challenge for system owners, users, and society. It limits the capacity of system installations, it requires large cooling infrastructures, and it is the cause of a large carbon footprint. Reducing power during application execution without changing the application source code or increasing time-to-completion is highly desirable in real-life high-performance computing scenarios. The power management run-time frameworks proposed in the last decade are based on the assumption that the duration of communication and application phases in an MPI application can be predicted and used at run-time to trade-off communication slack with power consumption. In this manuscript, we first show that this assumption is too general and leads to mispredictions, slowing down applications, thereby jeopardizing the claimed benefits. We then propose a new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
