Leveraging Multi-Instance GPUs through moldable task scheduling
Jorge Villarrubia, Luis Costero, Francisco D. Igual, Katzalin Olcoz

TL;DR
This paper introduces a novel scheduling algorithm, FAR, for multi-task execution on NVIDIA MIG GPUs, achieving near-optimal makespan with significant improvements over existing methods.
Contribution
The work presents a new 3-phase algorithm for moldable task scheduling on MIG GPUs, addressing reconfiguration costs and providing strong approximation guarantees.
Findings
FAR achieves a makespan within 1.22x of the optimal in real experiments.
The algorithm outperforms state-of-the-art scheduling methods.
Reconfiguration-aware scheduling significantly improves GPU utilization.
Abstract
NVIDIA MIG (Multi-Instance GPU) allows partitioning a physical GPU into multiple logical instances with fully-isolated resources, which can be dynamically reconfigured. This work highlights the untapped potential of MIG through moldable task scheduling with dynamic reconfigurations. Specifically, we propose a makespan minimization problem for multi-task execution under MIG constraints. Our profiling shows that assuming monotonicity in task work with respect to resources is not viable, as is usual in multicore scheduling. Relying on a state-of-the-art proposal that does not require such an assumption, we present FAR, a 3-phase algorithm to solve the problem. Phase 1 of FAR builds on a classical task moldability method, phase 2 combines Longest Processing Time First and List Scheduling with a novel repartitioning tree heuristic tailored to MIG constraints, and phase 3 employs local search…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
