A Unified, Hardware-Fitted, Cross-GPU Performance Model
James Stevens, Andreas Kl\"ockner

TL;DR
This paper introduces a unified, hardware-fitted performance model for GPUs that predicts kernel run time using symbolic operation counts, applicable across multiple hardware generations and vendors with comparable accuracy to specialized models.
Contribution
The authors develop a symbolic, linear performance model that is adaptable to various GPU hardware and can accurately predict kernel execution times across different platforms.
Findings
Model achieves comparable accuracy to hardware-specific models
Applicable across multiple GPU vendors and generations
Uses symbolic operation counts for performance prediction
Abstract
We present a mechanism to symbolically gather performance-relevant operation counts from numerically-oriented subprograms (`kernels') expressed in the Loopy programming system, and apply these counts in a simple, linear model of kernel run time. We use a series of `performance-instructive' kernels to fit the parameters of a unified model to the performance characteristics of GPU hardware from multiple hardware generations and vendors. We evaluate the predictive power of the model on a broad array of computational kernels relevant to scientific computing. In terms of the geometric mean, our simple, vendor- and GPU-type-independent model achieves relative accuracy comparable to that of previously published work using hardware specific models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Advanced Data Storage Technologies
