Modeling GPU Dynamic Parallelism for Self Similar Density Workloads
Felipe A. Quezada, Crist\'obal A. Navarro, Miguel Romero, Cristhian, Aguilera

TL;DR
This paper introduces a cost model and a new subdivision method called ASK for GPU workloads with self similar density, demonstrating significant performance improvements over existing dynamic parallelism techniques.
Contribution
It presents a subdivision cost model for SSD workloads and proposes ASK as a low-overhead alternative to CUDA DP, validated through the Mandelbrot set case study.
Findings
ASK runs up to 60% faster than CUDA DP.
ASK is up to 12 times faster than basic exhaustive methods.
The cost model accurately predicts optimal subdivision parameters.
Abstract
Dynamic Parallelism (DP) is a runtime feature of the GPU programming model that allows GPU threads to execute additional GPU kernels, recursively. Apart from making the programming of parallel hierarchical patterns easier, DP can also speedup problems that exhibit a heterogeneous data layout by focusing, through a subdivision process, the finite GPU resources on the sub-regions that exhibit more parallelism. However, doing an optimal subdivision process is not trivial, as there are different parameters that play an important role in the final performance of DP. Moreover, the current programming abstraction for DP also introduces an overhead that can penalize the final performance. In this work we present a subdivision cost model for problems that exhibit self similar density (SSD) workloads (such as fractals), in order understand what parameters provide the fastest subdivision approach.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Numerical Analysis Techniques · Medical Image Segmentation Techniques · 3D Shape Modeling and Analysis
