Toward an End-to-End Auto-tuning Framework in HPC PowerStack
Xingfu Wu, Aniruddha Marathe, Siddhartha Jana, Ondrej Vysocky, and Jophin John, Andrea Bartolini, Lubomir Riha, Michael Gerndt and, Valerie Taylor, Sridutt Bhalachandra

TL;DR
This paper discusses the development of an end-to-end auto-tuning framework for the HPC PowerStack to optimize power and performance in high-performance computing systems under energy constraints.
Contribution
It introduces a novel auto-tuning framework for the entire PowerStack, highlighting opportunities for co-tuning layers and outlining future research challenges.
Findings
Proposed a comprehensive end-to-end auto-tuning framework
Identified opportunities for co-tuning across PowerStack layers
Outlined research challenges for multi-layer auto-tuning
Abstract
Efficiently utilizing procured power and optimizing performance of scientific applications under power and energy constraints are challenging. The HPC PowerStack defines a software stack to manage power and energy of high-performance computing systems and standardizes the interfaces between different components of the stack. This survey paper presents the findings of a working group focused on the end-to-end tuning of the PowerStack. First, we provide a background on the PowerStack layer-specific tuning efforts in terms of their high-level objectives, the constraints and optimization goals, layer-specific telemetry, and control parameters, and we list the existing software solutions that address those challenges. Second, we propose the PowerStack end-to-end auto-tuning framework, identify the opportunities in co-tuning different layers in the PowerStack, and present specific use cases…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Distributed and Parallel Computing Systems
