Using hardware performance counters to speed up autotuning convergence   on GPUs

Ji\v{r}\'i Filipovi\v{c}; Jana Hozzov\'a; Amin Nezarat and; Jaroslav O\v{l}ha; Filip Petrovi\v{c}

arXiv:2102.05297·cs.DC·September 20, 2021

Using hardware performance counters to speed up autotuning convergence on GPUs

Ji\v{r}\'i Filipovi\v{c}, Jana Hozzov\'a, Amin Nezarat and, Jaroslav O\v{l}ha, Filip Petrovi\v{c}

PDF

TL;DR

This paper presents a novel autotuning method for GPUs that leverages hardware performance counters to efficiently navigate tuning spaces, reducing convergence time and improving portability across hardware and data variations.

Contribution

The authors introduce a new approach that uses hardware performance counters to guide autotuning, significantly speeding up convergence and enhancing portability across different GPU architectures and data types.

Findings

01

Method reduces autotuning convergence time.

02

Outperforms state-of-the-art search techniques.

03

Effective across diverse GPU architectures and data characteristics.

Abstract

Nowadays, GPU accelerators are commonly used to speed up general-purpose computing tasks on a variety of hardware. However, due to the diversity of GPU architectures and processed data, optimization of codes for a particular type of hardware and specific data characteristics can be extremely challenging. The autotuning of performance-relevant source-code parameters allows for automatic optimization of applications and keeps their performance portable. Although the autotuning process typically results in code speed-up, searching the tuning space can bring unacceptable overhead if (i) the tuning space is vast and full of poorly-performing implementations, or (ii) the autotuning process has to be repeated frequently because of changes in processed data or migration to different hardware. In this paper, we introduce a novel method for searching tuning spaces. The method takes advantage of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.