Design Principles of Dynamic Resource Management for High-Performance Parallel Programming Models
Dominik Huber, Martin Schreiber, Martin Schulz, Howard Pritchard,, Daniel Holmes

TL;DR
This paper discusses the design principles for implementing dynamic resource management in high-performance computing, emphasizing the need for standardized interfaces and a holistic approach to improve system efficiency and flexibility.
Contribution
It introduces a set of design principles for dynamic resource management in HPC and presents a prototype implementation using MPI.
Findings
Survey of existing approaches to DRM in HPC
Proposed design principles for DMR
Prototype implementation demonstrating feasibility
Abstract
With Dynamic Resource Management (DRM) the resources assigned to a job can be changed dynamically during its execution. From the system's perspective, DRM opens a new level of flexibility in resource allocation and job scheduling and therefore has the potential to improve system efficiency metrics such as the utilization rate, job throughput, energy efficiency, and responsiveness. From the application perspective, users can tailor the resources they request to their needs offering potential optimizations in queuing time or charged costs. Despite these obvious advantages and many attempts over the last decade to establish DRM in HPC, it remains a concept discussed in academia rather than being successfully deployed on production systems. This stems from the fact that support for DRM requires changes in all the layers of the HPC system software stack including applications, programming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Cloud Computing and Resource Management
