GPU Kernel Scientist: An LLM-Driven Framework for Iterative Kernel Optimization

Martin Andrews; Sam Witteveen

arXiv:2506.20807·cs.LG·August 25, 2025

GPU Kernel Scientist: An LLM-Driven Framework for Iterative Kernel Optimization

Martin Andrews, Sam Witteveen

PDF

Open Access

TL;DR

This paper presents an LLM-driven framework called GPU Kernel Scientist that automates iterative GPU kernel optimization, especially for new or less-documented architectures, by combining hypothesis generation, code modification, and performance evaluation.

Contribution

It introduces a novel multi-stage, evolutionary methodology leveraging LLMs to automate and accelerate GPU kernel optimization processes.

Findings

01

Successfully optimized kernels for AMD MI300 architecture

02

Demonstrated automation of hypothesis generation and code modification

03

Showed potential to democratize GPU optimization in resource-constrained settings

Abstract

Optimizing GPU kernels for high performance is a complex task, often demanding deep architectural knowledge, extensive profiling, and iterative experimentation. This challenge is amplified when targeting newer or less-documented GPU architectures where traditional development aids are scarce. This paper introduces an LLM-powered "GPU Kernel Scientist," an automated methodology for iteratively refining accelerator kernels. Our methodology employs LLMs in a multi-stage, evolutionary process: (a) strategically selecting promising prior code versions as a basis for new iterations; (b) generating hypotheses for optimization experiments, based on existing code and assimilated knowledge from general GPU literature; and (c) autonomously implementing these experiments through code modification and subsequent submission to an external evaluation system, using only observed timing data as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications