Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and   Scheduling

Jianlong Zhong; Bingsheng He

arXiv:1303.5164·cs.DC·March 22, 2013·5 cites

Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling

Jianlong Zhong, Bingsheng He

PDF

Open Access

TL;DR

Kernelet is a runtime system that enhances GPU throughput in shared environments by dynamically slicing kernels into tunable sub-kernels and scheduling them based on a Markov chain model, significantly improving performance.

Contribution

It introduces a novel dynamic slicing and scheduling approach for concurrent GPU kernels, guided by a Markov chain performance model, to optimize throughput.

Findings

01

Achieves up to 31.1% performance improvement on Tesla C2050.

02

Achieves up to 23.4% performance improvement on GTX680.

03

Effectively utilizes GPU resources in shared environments.

Abstract

Graphics processors, or GPUs, have recently been widely used as accelerators in the shared environments such as clusters and clouds. In such shared environments, many kernels are submitted to GPUs from different users, and throughput is an important metric for performance and total ownership cost. Despite the recently improved runtime support for concurrent GPU kernel executions, the GPU can be severely underutilized, resulting in suboptimal throughput. In this paper, we propose Kernelet, a runtime system with dynamic slicing and scheduling techniques to improve the throughput of concurrent kernel executions on the GPU. With slicing, Kernelet divides a GPU kernel into multiple sub-kernels (namely slices). Each slice has tunable occupancy to allow co-scheduling with other slices and to fully utilize the GPU resources. We develop a novel and effective Markov chain based performance model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Cloud Computing and Resource Management · Advanced Data Storage Technologies