A Similarity Measure for GPU Kernel Subgraph Matching
Robert Lim, Boyana Norris, Allen Malony

TL;DR
This paper introduces CUDAflow, a tool that analyzes CUDA kernel control flow graphs to measure similarity and extract insights, aiding optimization and understanding of GPU kernel behavior.
Contribution
CUDAflow provides a novel static and dynamic analysis method for CUDA binaries, enabling subgraph matching of CFGs to characterize kernel resource requirements.
Findings
Reveals new thread divergence patterns in GPU kernels
Demonstrates effectiveness on SHOC and Rodinia benchmarks
Aids in autotuning and compiler optimization
Abstract
Accelerator architectures specialize in executing SIMD (single instruction, multiple data) in lockstep. Because the majority of CUDA applications are parallelized loops, control flow information can provide an in-depth characterization of a kernel. CUDAflow is a tool that statically separates CUDA binaries into basic block regions and dynamically measures instruction and basic block frequencies. CUDAflow captures this information in a control flow graph (CFG) and performs subgraph matching across various kernel's CFGs to gain insights to an application's resource requirements, based on the shape and traversal of the graph, instruction operations executed and registers allocated, among other information. The utility of CUDAflow is demonstrated with SHOC and Rodinia application case studies on a variety of GPU architectures, revealing novel thread divergence characteristics that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
