Serving DNN Models with Multi-Instance GPUs: A Case of the Reconfigurable Machine Scheduling Problem
Cheng Tan, Zhichao Li, Jian Zhang, Yu Cao, Sikai Qi, Zherui Liu, Yibo, Zhu, Chuanxiong Guo

TL;DR
This paper introduces MIG-serving, an algorithm pipeline for efficiently partitioning NVIDIA A100 GPUs for DNN serving, significantly reducing GPU usage while maintaining throughput, by solving a new NP-hard scheduling problem.
Contribution
It defines the Reconfigurable Machine Scheduling Problem (RMS) and proposes MIG-serving, a novel solution combining multiple algorithms for optimal GPU partitioning in DNN serving.
Findings
MIG-serving can save up to 40% of GPUs compared to default A100 usage.
The solution effectively balances GPU partitioning with throughput requirements.
Experimental results validate the efficiency of the proposed algorithms.
Abstract
Multi-Instance GPU (MIG) is a new feature introduced by NVIDIA A100 GPUs that partitions one physical GPU into multiple GPU instances. With MIG, A100 can be the most cost-efficient GPU ever for serving Deep Neural Networks (DNNs). However, discovering the most efficient GPU partitions is challenging. The underlying problem is NP-hard; moreover, it is a new abstract problem, which we define as the Reconfigurable Machine Scheduling Problem (RMS). This paper studies serving DNNs with MIG, a new case of RMS. We further propose a solution, MIG-serving. MIG- serving is an algorithm pipeline that blends a variety of newly designed algorithms and customized classic algorithms, including a heuristic greedy algorithm, Genetic Algorithm (GA), and Monte Carlo Tree Search algorithm (MCTS). We implement MIG-serving on Kubernetes. Our experiments show that compared to using A100 as-is, MIG-serving can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
