SGDRC: Software-Defined Dynamic Resource Control for Concurrent DNN Inference on NVIDIA GPUs
Yongkang Zhang, Haoxuan Yu, Chenxia Han, Cheng Wang, Baotong Lu,, Yunzhe Li, Zhifeng Jiang, Yang Li, Xiaowen Chu, Huaicheng Li

TL;DR
SGDRC is a software-defined solution that dynamically manages VRAM bandwidth and compute units on NVIDIA GPUs, significantly improving throughput and service level adherence for concurrent DNN inference workloads.
Contribution
This paper introduces SGDRC, a novel software-based approach for dynamic resource management on NVIDIA GPUs, addressing the lack of flexible, hardware-agnostic solutions for concurrent DNN inference.
Findings
Achieves 99.0% SLO attainment rate on average.
Increases overall throughput by up to 1.47x.
Enhances BE job throughput by up to 2.36x.
Abstract
Cloud service providers heavily colocate high-priority, latency-sensitive (LS), and low-priority, best-effort (BE) DNN inference services on the same GPU to improve resource utilization in data centers. Among the critical shared GPU resources, there has been very limited analysis on the dynamic allocation of compute units and VRAM bandwidth, mainly for two reasons: (1) The native GPU resource management solutions are either hardware-specific, or unable to dynamically allocate resources to different tenants, or both; (2) NVIDIA doesn't expose interfaces for VRAM bandwidth allocation, and the software stack and VRAM channel architectures are black-box, both of which limit the software-level resource management. These drive prior work to design either conservative sharing policies detrimental to throughput, or static resource partitioning only applicable to a few GPU models. To bridge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiation Detection and Scintillator Technologies · Atomic and Subatomic Physics Research · Geophysical Methods and Applications
