Taming GPU Underutilization via Static Partitioning and Fine-grained CPU Offloading
Gabin Schieffer, Ruimin Shi, Jie Ren, Ivy Peng

TL;DR
This paper analyzes GPU sharing limitations and proposes a memory-offloading scheme using Nvlink-C2C to improve utilization and reduce underutilization in diverse workloads.
Contribution
It provides a system-level characterization of GPU sharing options and introduces a novel memory-offloading scheme to address resource mismatch issues.
Findings
GPU sharing via MIG reduces underutilization but still faces interference issues.
Coarse-grained provisioning often mismatches application needs.
Memory offloading via Nvlink-C2C improves resource utilization.
Abstract
Advances in GPU compute throughput and memory capacity brings significant opportunities to a wide range of workloads. However, efficiently utilizing these resources remains challenging, particularly because diverse application characteristics may result in imbalanced utilization. Multi-Instance GPU (MIG) is a promising approach to improve utilization by partitioning GPU compute and memory resources into fixed-size slices with isolation. Yet, its effectiveness and limitations in supporting HPC workloads remain an open question. We present a comprehensive system-level characterization of different GPU sharing options using real-world scientific, AI, and data analytics applications, including NekRS, LAMMPS, Llama3, and Qiskit. Our analysis reveals that while GPU sharing via MIG can significantly reduce resource underutilization, and enable system-level improvements in throughput and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
