GPU Performance Portability needs Autotuning

Burkhard Ringlein; Thomas Parnell; Radu Stoica

arXiv:2505.03780·cs.AR·July 18, 2025

GPU Performance Portability needs Autotuning

Burkhard Ringlein, Thomas Parnell, Radu Stoica

PDF

Open Access 1 Repo

TL;DR

This paper advocates combining JIT compilation with autotuning to achieve portable, high-performance LLM inference across different GPU hardware, reducing manual optimization and vendor lock-in.

Contribution

It introduces a method that uses autotuning with JIT compilation to enable portable and efficient LLM kernel execution across diverse GPU platforms.

Findings

01

Explores up to 15x more kernel configurations.

02

Outperforms vendor-optimized implementations by up to 230%.

03

Reduces kernel code size by 70x.

Abstract

As LLMs grow in complexity, achieving state-of-the-art performance requires tight co-design across algorithms, software, and hardware. Today's reliance on a single dominant platform limits portability, creates vendor lock-in, and raises barriers for new AI hardware. In this work, we make the case for combining just-in-time (JIT) compilation with comprehensive kernel parameter autotuning to enable portable LLM inference with state-of-the-art performance without code changes. Focusing on performance-critical LLM kernels, we demonstrate that this approach explores up to 15x more kernel parameter configurations, produces significantly more diverse code across multiple dimensions, and even outperforms vendor-optimized implementations by up to 230%, all while reducing kernel code size by 70x and eliminating manual code optimizations. Our results highlight autotuning as a promising path to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IBM/triton-dejavu
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Advanced Data Storage Technologies

MethodsSoftmax · Attention Is All You Need