Hybrid Workload Scheduling on HPC Systems
Yuping Fan, Paul Rich, William Allcock, Michael Papka and, Zhiling Lan

TL;DR
This paper proposes and evaluates new scheduling mechanisms for efficiently managing hybrid workloads, including on-demand, rigid, and malleable jobs, on a single HPC system to improve responsiveness and overall system performance.
Contribution
It introduces novel scheduling strategies for co-scheduling diverse HPC workloads on a single system, addressing responsiveness, malleability incentives, and performance tradeoffs.
Findings
Proposed mechanisms effectively reduce on-demand request delays.
Incentives for malleable job declarations are increased.
Overall system performance is improved under various workloads.
Abstract
Traditionally, on-demand, rigid, and malleable applications have been scheduled and executed on separate systems. The ever-growing workload demands and rapidly developing HPC infrastructure trigger the interest of converging these applications on a single HPC system. Although allocating the hybrid workloads within one system could potentially improve system efficiency, it is difficult to balance the tradeoff between the responsiveness of on-demand requests, the incentive for malleable jobs, and the performance of rigid applications. In this study, we present several scheduling mechanisms to address the issues involved in co-scheduling on-demand, rigid, and malleable jobs on a single HPC system. We extensively evaluate and compare their performance under various configurations and workloads. Our experimental results show that our proposed mechanisms are capable of serving on-demand…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Cloud Computing and Resource Management · Parallel Computing and Optimization Techniques
