Effective GPU Sharing Under Compiler Guidance

Chao Chen; Chris Porter; Santosh Pande

arXiv:2107.08538·cs.DC·July 20, 2021

Effective GPU Sharing Under Compiler Guidance

Chao Chen, Chris Porter, Santosh Pande

PDF

Open Access

TL;DR

This paper introduces an automated, compiler-guided GPU scheduling framework that optimizes resource sharing among multiple kernels, significantly improving throughput and turnaround time without degrading individual kernel performance.

Contribution

It presents a novel compiler-assisted scheduling approach that dynamically allocates GPU resources based on kernel requirements, outperforming existing solutions.

Findings

01

Up to 2.5x throughput improvement on benchmarks

02

Up to 4.9x reduction in job turnaround time

03

Kernel performance degradation limited to 2.5%

Abstract

Modern computing platforms tend to deploy multiple GPUs (2, 4, or more) on a single node to boost system performance, with each GPU having a large capacity of global memory and streaming multiprocessors (SMs). GPUs are an expensive resource, and boosting utilization of GPUs without causing performance degradation of individual workloads is an important and challenging problem. Although services like MPS support simultaneous execution of multiple co-operative kernels on a single device, they do not solve the above problem for uncooperative kernels, MPS being oblivious to the resource needs of each kernel. We propose a fully automated compiler-assisted scheduling framework. The compiler constructs GPU tasks by identifying kernel launches and their related GPU operations (e.g. memory allocations). For each GPU task, a probe is instrumented in the host-side code right before its launch…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Advanced Neural Network Applications