A Compiler Framework for Optimizing Dynamic Parallelism on GPUs

Mhd Ghaith Olabi; Juan G\'omez Luna; Onur Mutlu; Wen-mei Hwu; Izzat El; Hajj

arXiv:2201.02789·cs.DC·January 11, 2022

A Compiler Framework for Optimizing Dynamic Parallelism on GPUs

Mhd Ghaith Olabi, Juan G\'omez Luna, Onur Mutlu, Wen-mei Hwu, Izzat El, Hajj

PDF

1 Repo

TL;DR

This paper introduces a compiler framework with three key optimizations—thresholding, coarsening, and aggregation—to improve the performance of dynamic parallelism on GPUs, especially for irregular nested workloads.

Contribution

It presents a novel compiler framework that optimizes dynamic parallelism on GPUs through thresholding, coarsening, and aggregation techniques, reducing performance penalties.

Findings

01

Achieves a 43.0x geometric mean speedup over non-optimized dynamic parallelism.

02

Attains an 8.7x speedup over applications without dynamic parallelism.

03

Provides a 3.6x improvement over prior dynamic parallelism aggregation methods.

Abstract

Dynamic parallelism on GPUs allows GPU threads to dynamically launch other GPU threads. It is useful in applications with nested parallelism, particularly where the amount of nested parallelism is irregular and cannot be predicted beforehand. However, prior works have shown that dynamic parallelism may impose a high performance penalty when a large number of small grids are launched. The large number of launches results in high launch latency due to congestion, and the small grid sizes result in hardware underutilization. To address this issue, we propose a compiler framework for optimizing the use of dynamic parallelism in applications with nested parallelism. The framework features three key optimizations: thresholding, coarsening, and aggregation. Thresholding involves launching a grid dynamically only if the number of child threads exceeds some threshold, and serializing the child…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gaitholabi/klap-cgo22
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.