Opening the Black Box: Performance Estimation during Code Generation for GPUs
Dominik Ernst (1), Georg Hager (1), Markus Holzer (2), Matthias Knorr,, Gerhard Wellein (1) ((1) Friedrich-Alexander-Universit\"at, Erlangen-N\"urnberg, (2) Chair for System Simulation,, Friedrich-Alexander-Universtit\"at Erlangen-N\"urnberg)

TL;DR
This paper presents a performance estimation method for GPU code generation that uses a performance model and hardware metrics to efficiently identify high-performing configurations, reducing reliance on time-consuming autotuning.
Contribution
It introduces a performance modeling approach coupled with an analytic hardware metric estimator for rapid configuration exploration in GPU code generation.
Findings
Accurately ranks GPU kernel configurations for stencil and fluid simulation applications.
Reduces search time for optimal configurations compared to traditional autotuning.
Can be integrated into existing code generators for performance-aware code selection.
Abstract
Automatic code generation is frequently used to create implementations of algorithms specifically tuned to particular hardware and application parameters. The code generation process involves the selection of adequate code transformations, tuning parameters, and parallelization strategies. To cover the huge search space, code generation frameworks may apply time-intensive autotuning, exploit scenario-specific performance models, or treat performance as an intangible black box that must be described via machine learning. This paper addresses the selection problem by identifying the relevant performance-defining mechanisms through a performance model coupled with an analytic hardware metric estimator. This enables a quick exploration of large configuration spaces to identify highly efficient candidates with high accuracy. Our current approach targets memory-intensive GPGPU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLattice Boltzmann Simulation Studies · Advanced Data Storage Technologies · Parallel Computing and Optimization Techniques
