GPA: A GPU Performance Advisor Based on Instruction Sampling
Keren Zhou, Xiaozhu Meng, Ryuichi Sai, John Mellor-Crummey

TL;DR
GPA is a GPU performance advisor that uses instruction sampling and data flow analysis to identify optimization opportunities at multiple code levels, providing actionable suggestions to improve GPU kernel performance.
Contribution
GPA introduces a hierarchical performance advisory tool that leverages instruction sampling and data flow analysis to offer detailed optimization suggestions for NVIDIA GPUs.
Findings
Achieved speedups of 1.01× to 3.53× on V100 GPU
Provides detailed, actionable performance reports
Effective across benchmarks and real applications
Abstract
Developing efficient GPU kernels can be difficult because of the complexity of GPU architectures and programming models. Existing performance tools only provide coarse-grained suggestions at the kernel level, if any. In this paper, we describe GPA, a performance advisor for NVIDIA GPUs that suggests potential code optimization opportunities at a hierarchy of levels, including individual lines, loops, and functions. To relieve users of the burden of interpreting performance counters and analyzing bottlenecks, GPA uses data flow analysis to approximately attribute measured instruction stalls to their root causes and uses information about a program's structure and the GPU to match inefficiency patterns with suggestions for optimization. To quantify each suggestion's potential benefits, we developed PC sampling-based performance models to estimate its speedup. Our experiments with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Advanced Neural Network Applications
