Duration-Informed Workload Scheduler
Daniela Loreti, Davide Leone, Andrea Borghesi

TL;DR
This paper presents a workload scheduler for high-performance computing that uses machine learning to predict job durations, reducing mean waiting time by around 11%.
Contribution
It introduces a novel scheduler with an integrated machine learning-based duration prediction module, improving scheduling efficiency.
Findings
Mean waiting time decreased by approximately 11%.
Performance evaluated on Tier-0 supercomputer workload traces.
Enhanced scheduling leads to better user experience and system turnaround.
Abstract
High-performance computing systems are complex machines whose behaviour is governed by the correct functioning of its many subsystems. Among these, the workload scheduler has a crucial impact on the timely execution of the jobs continuously submitted to the computing resources. Making high-quality scheduling decisions is contingent on knowing the duration of submitted jobs before their execution--a non-trivial task for users that can be tackled with Machine Learning. In this work, we devise a workload scheduler enhanced with a duration prediction module built via Machine Learning. We evaluate its effectiveness and show its performance using workload traces from a Tier-0 supercomputer, demonstrating a decrease in mean waiting time across all jobs of around 11%. Lower waiting times are directly connected to better quality of service from the users' point of view and higher turnaround…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
