Cross-layer Application-aware Power/Energy Management for Extreme Scale Science
Ivan Rodero, Manish Parashar

TL;DR
This paper discusses the need for a comprehensive, cross-layer, application-aware approach to power and energy management in extreme-scale HPC systems to achieve significant efficiency improvements while maintaining performance and reliability.
Contribution
It introduces a novel cross-layer, application-aware framework for power and energy management tailored for exascale HPC systems, addressing multiple objectives and tradeoffs.
Findings
Highlighting the limitations of current HPC power solutions.
Proposing a multi-layer, application-aware management strategy.
Emphasizing the importance of integrated approaches for exascale energy efficiency.
Abstract
High Performance Computing (HPC) has evolved over the past decades into increasingly complex and powerful systems. Current HPC systems consume several MWs of power, enough to power small towns, and are in fact soon approaching the limits of the power available to them. Estimates are with the given current technology, achieving exascale will require hundreds of MW, which is not feasible from multiple perspectives. Architecture and technology researchers are aggressively addressing this; however as past history is shown, innovation at these levels are not sufficient and have to be accompanied with innovations at higher levels (algorithms, programming, runtime, OS) to achieve the multiple orders of magnitude reduction - i.e., a comprehensive cross-layer and application-aware strategy is required. Furthermore, energy/power-efficiency has to be addressed in combination with quality of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Distributed and Parallel Computing Systems
