Using hardware performance counters to speed up autotuning convergence on GPUs
Ji\v{r}\'i Filipovi\v{c}, Jana Hozzov\'a, Amin Nezarat and, Jaroslav O\v{l}ha, Filip Petrovi\v{c}

TL;DR
This paper presents a novel autotuning method for GPUs that leverages hardware performance counters to efficiently navigate tuning spaces, reducing convergence time and improving portability across hardware and data variations.
Contribution
The authors introduce a new approach that uses hardware performance counters to guide autotuning, significantly speeding up convergence and enhancing portability across different GPU architectures and data types.
Findings
Method reduces autotuning convergence time.
Outperforms state-of-the-art search techniques.
Effective across diverse GPU architectures and data characteristics.
Abstract
Nowadays, GPU accelerators are commonly used to speed up general-purpose computing tasks on a variety of hardware. However, due to the diversity of GPU architectures and processed data, optimization of codes for a particular type of hardware and specific data characteristics can be extremely challenging. The autotuning of performance-relevant source-code parameters allows for automatic optimization of applications and keeps their performance portable. Although the autotuning process typically results in code speed-up, searching the tuning space can bring unacceptable overhead if (i) the tuning space is vast and full of poorly-performing implementations, or (ii) the autotuning process has to be repeated frequently because of changes in processed data or migration to different hardware. In this paper, we introduce a novel method for searching tuning spaces. The method takes advantage of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
