GPU Kernel Scientist: An LLM-Driven Framework for Iterative Kernel Optimization
Martin Andrews, Sam Witteveen

TL;DR
This paper presents an LLM-driven framework called GPU Kernel Scientist that automates iterative GPU kernel optimization, especially for new or less-documented architectures, by combining hypothesis generation, code modification, and performance evaluation.
Contribution
It introduces a novel multi-stage, evolutionary methodology leveraging LLMs to automate and accelerate GPU kernel optimization processes.
Findings
Successfully optimized kernels for AMD MI300 architecture
Demonstrated automation of hypothesis generation and code modification
Showed potential to democratize GPU optimization in resource-constrained settings
Abstract
Optimizing GPU kernels for high performance is a complex task, often demanding deep architectural knowledge, extensive profiling, and iterative experimentation. This challenge is amplified when targeting newer or less-documented GPU architectures where traditional development aids are scarce. This paper introduces an LLM-powered "GPU Kernel Scientist," an automated methodology for iteratively refining accelerator kernels. Our methodology employs LLMs in a multi-stage, evolutionary process: (a) strategically selecting promising prior code versions as a basis for new iterations; (b) generating hypotheses for optimization experiments, based on existing code and assimilated knowledge from general GPU literature; and (c) autonomously implementing these experiments through code modification and subsequent submission to an external evaluation system, using only observed timing data as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications
