MISO: Exploiting Multi-Instance GPU Capability on Multi-Tenant Systems for Machine Learning
Baolin Li, Tirthak Patel, Siddarth Samsi, Vijay Gadepally, Devesh, Tiwari

TL;DR
MISO leverages NVIDIA's Multi-Instance GPU capability and Multi-Process Service to dynamically partition GPU resources among co-located jobs, significantly improving resource utilization and reducing job completion times.
Contribution
It introduces a novel dynamic partitioning technique using MPS to optimize MIG resource allocation without high overhead.
Findings
Achieves 49% lower average job completion time compared to unpartitioned schemes.
Achieves 16% lower average job completion time compared to static partition schemes.
Effectively utilizes GPU resources in multi-tenant environments.
Abstract
GPU technology has been improving at an expedited pace in terms of size and performance, empowering HPC and AI/ML researchers to advance the scientific discovery process. However, this also leads to inefficient resource usage, as most GPU workloads, including complicated AI/ML models, are not able to utilize the GPU resources to their fullest extent -- encouraging support for GPU multi-tenancy. We propose MISO, a technique to exploit the Multi-Instance GPU (MIG) capability on the latest NVIDIA datacenter GPUs (e.g., A100, H100) to dynamically partition GPU resources among co-located jobs. MISO's key insight is to use the lightweight, more flexible Multi-Process Service (MPS) capability to predict the best MIG partition allocation for different jobs, without incurring the overhead of implementing them during exploration. Due to its ability to utilize GPU resources more efficiently, MISO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques
