GPU Kernel Optimization Beyond Full Builds: An LLM Framework with Minimal Executable Programs
Ruifan Chu, Anbang Wang, Xiuxiu Bai, Shuai Liu, Xiaoshe Dong

TL;DR
This paper introduces an LLM-based framework that optimizes GPU kernels by creating minimal executable programs, enabling efficient, cross-platform kernel tuning without full application builds, achieving significant speedups.
Contribution
It presents a novel end-to-end LLM framework that automatically completes, optimizes, and validates GPU kernels as minimal executable programs without full application recompilation.
Findings
Achieves up to 7.77x speedup on benchmark kernels
Reduces search cost through reuse of optimization strategies
Enables cross-platform GPU kernel optimization without full builds
Abstract
In high-performance computing, hotspot GPU kernels are primary bottlenecks, and expert manual tuning is costly and hard to port. Large language model methods often assume kernels can be compiled and executed cheaply, which fails in large applications where full builds and runs are expensive. We present an end-to-end LLM framework with performance feedback that optimizes kernels without building the full application. From independently extracted hotspot kernels, it automatically completes code into a Minimal Executable Program (MEP), then performs multi-round iterative optimization and evaluation outside the full application. The framework integrates Automatic Error Repair and Performance Pattern Inheritance to fix faults, preserve correctness, reuse effective tiling/memory/synchronization strategies, and reduce search cost. Optimized variants are reintegrated into the original…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Big Data and Digital Economy
