Making LLMs Optimize Multi-Scenario CUDA Kernels Like Experts
Yuxuan Han, Meng-Hao Guo, Zhengning Liu, Wenguang Chen, Shi-Min Hu

TL;DR
This paper introduces CUDAMaster, a system that automates GPU kernel optimization across multiple scenarios, significantly improving performance and surpassing existing methods like Astra and cuBLAS.
Contribution
It develops MSKernelBench, a comprehensive benchmark for multi-scenario GPU kernels, and proposes CUDAMaster, a hardware-aware, multi-agent system for automated kernel optimization.
Findings
CUDAMaster achieves about 35% speedup over Astra.
It matches or exceeds the performance of cuBLAS in several cases.
The benchmark covers diverse applications including scientific computing routines.
Abstract
Optimizing GPU kernels manually is a challenging and time-consuming task. With the rapid development of LLMs, automated GPU kernel optimization is gradually becoming a tangible reality. However, current LLM-driven automated optimization methods narrowly focus on machine learning applications, such as PyTorch operator optimization, while overlooking broader domains like sparse matrix operations in scientific computing. Extending to these broader applications brings new challenges for the benchmark and algorithm. Therefore, developing a general-purpose automated kernel optimization method becomes our primary focus. In this paper, we address the absence of systematic evaluation for multi-scenario settings by introducing MSKernelBench, which spans multiple scenarios, including fundamental algebraic operations, common LLM kernels, sparse matrix operators, and scientific computing routines,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Stochastic Gradient Optimization Techniques · Advanced Neural Network Applications
