Analytical Performance Estimation during Code Generation on Modern GPUs
Dominik Ernst, Markus Holzer, Georg Hager, Matthias Knorr, Gerhard, Wellein

TL;DR
This paper presents an analytical performance estimation method for GPU code generation that quickly identifies efficient configurations by modeling key performance mechanisms, reducing reliance on autotuning or machine learning.
Contribution
It introduces a performance model coupled with an analytic hardware metric estimator for GPU applications, enabling rapid exploration of configuration spaces with high accuracy.
Findings
Accurately models data transfer volumes on A100 GPU architecture.
Effectively ranks code candidates for stencil and fluid solver kernels.
Reduces time needed for performance tuning compared to traditional methods.
Abstract
Automatic code generation is frequently used to create implementations of algorithms specifically tuned to particular hardware and application parameters. The code generation process involves the selection of adequate code transformations, tuning parameters, and parallelization strategies. We propose an alternative to time-intensive autotuning, scenario-specific performance models, or black-box machine learning to select the best-performing configuration. This paper identifies the relevant performance-defining mechanisms for memory-intensive GPU applications through a performance model coupled with an analytic hardware metric estimator. This enables a quick exploration of large configuration spaces to identify highly efficient code candidates with high accuracy. We examine the changes of the A100 GPU architecture compared to the predecessor V100 and address the challenges of how to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Lattice Boltzmann Simulation Studies
