MISO: Exploiting Multi-Instance GPU Capability on Multi-Tenant Systems   for Machine Learning

Baolin Li; Tirthak Patel; Siddarth Samsi; Vijay Gadepally; Devesh; Tiwari

arXiv:2207.11428·cs.DC·October 10, 2022

MISO: Exploiting Multi-Instance GPU Capability on Multi-Tenant Systems for Machine Learning

Baolin Li, Tirthak Patel, Siddarth Samsi, Vijay Gadepally, Devesh, Tiwari

PDF

Open Access

TL;DR

MISO leverages NVIDIA's Multi-Instance GPU capability and Multi-Process Service to dynamically partition GPU resources among co-located jobs, significantly improving resource utilization and reducing job completion times.

Contribution

It introduces a novel dynamic partitioning technique using MPS to optimize MIG resource allocation without high overhead.

Findings

01

Achieves 49% lower average job completion time compared to unpartitioned schemes.

02

Achieves 16% lower average job completion time compared to static partition schemes.

03

Effectively utilizes GPU resources in multi-tenant environments.

Abstract

GPU technology has been improving at an expedited pace in terms of size and performance, empowering HPC and AI/ML researchers to advance the scientific discovery process. However, this also leads to inefficient resource usage, as most GPU workloads, including complicated AI/ML models, are not able to utilize the GPU resources to their fullest extent -- encouraging support for GPU multi-tenancy. We propose MISO, a technique to exploit the Multi-Instance GPU (MIG) capability on the latest NVIDIA datacenter GPUs (e.g., A100, H100) to dynamically partition GPU resources among co-located jobs. MISO's key insight is to use the lightweight, more flexible Multi-Process Service (MPS) capability to predict the best MIG partition allocation for different jobs, without incurring the overhead of implementing them during exploration. Due to its ability to utilize GPU resources more efficiently, MISO…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques