# SwizzlePerf: Hardware-Aware LLMs for GPU Kernel Performance Optimization

**Authors:** Arya Tschand, Muhammad Awad, Ryan Swann, Kesavan Ramakrishnan, Jeffrey Ma, Keith Lowery, Ganesh Dasika, Vijay Janapa Reddi

arXiv: 2508.20258 · 2025-08-29

## TL;DR

SwizzlePerf leverages hardware-specific information and large language models to automatically optimize GPU kernel performance, achieving significant speedups and efficiency improvements across diverse workloads.

## Contribution

It introduces a hardware-aware LLM-based approach for GPU kernel optimization, enabling rapid generation of effective spatial optimizations tailored to specific hardware architectures.

## Key findings

- Achieves up to 2.06x speedup on ML and Science kernels.
- Generates optimal swizzling patterns in under 5 minutes.
- Improves L2 hit rate by 70%.

## Abstract

Large language models (LLMs) have shown progress in GPU kernel performance engineering using inefficient search-based methods that optimize around runtime. Any existing approach lacks a key characteristic that human performance engineers rely on for near-optimal utilization -- hardware-awareness. By leveraging the workload's specific memory access patterns, architecture specifications, filtered profiling logs, and reflections on historical performance, we can make software-level optimizations that are tailored to the underlying hardware. SwizzlePerf automatically generates spatial optimizations for GPU kernels on disaggregated architectures by giving LLMs explicit hardware-awareness.   For a GEMM kernel, SwizzlePerf takes less than 5 minutes to generate the same hardware-specific optimal swizzling pattern that took expert performance engineers 2 weeks to find. On a suite of 10 diverse ML and Science kernels, SwizzlePerf can generate swizzling patterns for 9 of the kernels that achieve up to a 2.06x speedup and 70% improvement in L2 hit rate. This work is the first of many steps toward systematically creating hardware-aware LLM performance engineering agents.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.20258/full.md

## Figures

29 figures with captions in the complete paper: https://tomesphere.com/paper/2508.20258/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/2508.20258/full.md

---
Source: https://tomesphere.com/paper/2508.20258