On the Convergence of Malleability and the HPC PowerStack: Exploiting Dynamism in Over-Provisioned and Power-Constrained HPC Systems
Eishi Arima, Isa\'ias A. Compr\'es, Martin Schulz

TL;DR
This paper discusses the integration of resource malleability, co-scheduling, and power management techniques to optimize resource utilization and energy efficiency in future over-provisioned, power-constrained HPC systems, highlighting ongoing efforts in the HPC PowerStack initiative.
Contribution
It explores the relationships between malleability, co-scheduling, and power management, proposing a unified approach for future HPC system management.
Findings
Assessment of synergies between resource management techniques
Discussion of software requirements for integrated management
Introduction of ongoing integration efforts in HPC PowerStack
Abstract
Recent High-Performance Computing (HPC) systems are facing important challenges, such as massive power consumption, while at the same time significantly under-utilized system resources. Given the power consumption trends, future systems will be deployed in an over-provisioned manner where more resources are installed than they can afford to power simultaneously. In such a scenario, maximizing resource utilization and energy efficiency, while keeping a given power constraint, is pivotal. Driven by this observation, in this position paper we first highlight the recent trends of resource management techniques, with a particular focus on malleability support (i.e., dynamically scaling resource allocations/requirements for a job), co-scheduling (i.e., co-locating multiple jobs within a node), and power management. Second, we consider putting them together, assess their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
