TL;DR
This paper benchmarks various optimization algorithms for auto-tuning GPU kernels, analyzing their performance across multiple GPU architectures and kernel spaces, and introduces a new metric to assess problem difficulty.
Contribution
It provides a comprehensive experimental comparison of 16 optimization algorithms for GPU kernel auto-tuning and proposes a novel PageRank-based metric to evaluate problem complexity.
Findings
Certain algorithms outperform others depending on the time budget.
The PageRank-based metric correlates strongly with tuning performance.
The study offers insights into the difficulty of GPU kernel optimization problems.
Abstract
Recent years have witnessed phenomenal growth in the application, and capabilities of Graphical Processing Units (GPUs) due to their high parallel computation power at relatively low cost. However, writing a computationally efficient GPU program (kernel) is challenging, and generally only certain specific kernel configurations lead to significant increases in performance. Auto-tuning is the process of automatically optimizing software for highly-efficient execution on a target hardware platform. Auto-tuning is particularly useful for GPU programming, as a single kernel requires re-tuning after code changes, for different input data, and for different architectures. However, the discrete, and non-convex nature of the search space creates a challenging optimization problem. In this work, we investigate which algorithm produces the fastest kernels if the time-budget for the tuning task is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
