CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
Xiaoya Li, Xiaofei Sun, Albert Wang, Jiwei Li, Chris Shum

TL;DR
CUDA-L1 introduces a contrastive reinforcement learning framework that significantly enhances CUDA kernel optimization, achieving substantial speedups and uncovering fundamental principles, thereby advancing automated GPU performance tuning.
Contribution
The paper presents CUDA-L1, a novel contrastive RL approach that automates CUDA optimization, outperforming existing methods and revealing new insights into CUDA performance improvements.
Findings
Achieves an average speedup of 3.12x on CUDA kernels.
Outperforms Torch Compile, CUDA Graph, and cuDNN libraries.
Discovers and strategically combines CUDA optimization techniques.
Abstract
The exponential growth in demand for GPU computing resources has created an urgent need for automated CUDA optimization strategies. While recent advances in LLMs show promise for code generation, current SOTA models achieve low success rates in improving CUDA speed. In this paper, we introduce CUDA-L1, an automated reinforcement learning framework for CUDA optimization that employs a novel contrastive RL algorithm. CUDA-L1 achieves significant performance improvements on the CUDA optimization task: trained on A100, it delivers an average speedup of x3.12 with a median speedup of x1.42 against default baselines over across all 250 CUDA kernels of KernelBench, with peak speedups reaching x120. In addition to the default baseline provided by KernelBench, CUDA-L1 demonstrates x2.77 over Torch Compile, x2.88 over Torch Compile with reduce overhead, x2.81 over CUDA Graph implementations,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Neural Network Applications · Graph Theory and Algorithms
