Dynamic Fractional Resource Scheduling vs. Batch Scheduling
Henri Casanova (CoRG), Mark Stillwell (LIP, INRIA Grenoble, Rh\^one-Alpes / LIP Laboratoire de l'Informatique du Parall\'elisme),, Fr\'ed\'eric Vivien (LIP, INRIA Grenoble Rh\^one-Alpes / LIP Laboratoire de, l'Informatique du Parall\'elisme)

TL;DR
This paper introduces a new job scheduling method using virtualization to allocate fractional resources in HPC clusters, achieving significant performance improvements over traditional batch scheduling.
Contribution
It presents a novel VM-based scheduling heuristic that maximizes job performance metrics, with proven bounds and competitive resource utilization.
Findings
Order of magnitude improvement in job stretch
Comparable or better resource utilization
Effective in both synthetic and real HPC workloads
Abstract
We propose a novel job scheduling approach for homogeneous cluster computing platforms. Its key feature is the use of virtual machine technology to share fractional node resources in a precise and controlled manner. Other VM-based scheduling approaches have focused primarily on technical issues or on extensions to existing batch scheduling systems, while we take a more aggressive approach and seek to find heuristics that maximize an objective metric correlated with job performance. We derive absolute performance bounds and develop algorithms for the online, non-clairvoyant version of our scheduling problem. We further evaluate these algorithms in simulation against both synthetic and real-world HPC workloads and compare our algorithms to standard batch scheduling approaches. We find that our approach improves over batch scheduling by orders of magnitude in terms of job stretch, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Parallel Computing and Optimization Techniques
