Compiler-Assisted Workload Consolidation For Efficient Dynamic   Parallelism on GPU

Hancheng Wu; Da Li; Michela Becchi

arXiv:1606.08150·cs.DC·November 18, 2016

Compiler-Assisted Workload Consolidation For Efficient Dynamic Parallelism on GPU

Hancheng Wu, Da Li, Michela Becchi

PDF

TL;DR

This paper introduces compiler techniques to optimize dynamic parallelism on GPUs, significantly reducing overhead and boosting performance for complex parallel algorithms with irregular or recursive structures.

Contribution

It proposes three workload consolidation schemes implemented in a directive-based compiler to enhance GPU utilization for dynamic parallelism applications.

Findings

01

Achieved up to 3300x speedup over naive DP solutions

02

Reduced runtime overhead of DP-based codes

03

Improved GPU utilization for irregular and recursive algorithms

Abstract

GPUs have been widely used to accelerate computations exhibiting simple patterns of parallelism - such as flat or two-level parallelism - and a degree of parallelism that can be statically determined based on the size of the input dataset. However, the effective use of GPUs for algorithms exhibiting complex patterns of parallelism, possibly known only at runtime, is still an open problem. Recently, Nvidia has introduced Dynamic Parallelism (DP) in its GPUs. By making it possible to launch kernels directly from GPU threads, this feature enables nested parallelism at runtime. However, the effective use of DP must still be understood: a naive use of this feature may suffer from significant runtime overhead and lead to GPU underutilization, resulting in poor performance. In this work, we target this problem. First, we demonstrate how a naive use of DP can result in poor performance. Second,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.