Effective GPU Sharing Under Compiler Guidance
Chao Chen, Chris Porter, Santosh Pande

TL;DR
This paper introduces an automated, compiler-guided GPU scheduling framework that optimizes resource sharing among multiple kernels, significantly improving throughput and turnaround time without degrading individual kernel performance.
Contribution
It presents a novel compiler-assisted scheduling approach that dynamically allocates GPU resources based on kernel requirements, outperforming existing solutions.
Findings
Up to 2.5x throughput improvement on benchmarks
Up to 4.9x reduction in job turnaround time
Kernel performance degradation limited to 2.5%
Abstract
Modern computing platforms tend to deploy multiple GPUs (2, 4, or more) on a single node to boost system performance, with each GPU having a large capacity of global memory and streaming multiprocessors (SMs). GPUs are an expensive resource, and boosting utilization of GPUs without causing performance degradation of individual workloads is an important and challenging problem. Although services like MPS support simultaneous execution of multiple co-operative kernels on a single device, they do not solve the above problem for uncooperative kernels, MPS being oblivious to the resource needs of each kernel. We propose a fully automated compiler-assisted scheduling framework. The compiler constructs GPU tasks by identifying kernel launches and their related GPU operations (e.g. memory allocations). For each GPU task, a probe is instrumented in the host-side code right before its launch…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Advanced Neural Network Applications
