Dr. Kernel: Reinforcement Learning Done Right for Triton Kernel Generations
Wei Liu, Jiawei Xu, Yingru Li, Longtao Zheng, Tianjian Li, Qian Liu, Junxian He

TL;DR
This paper introduces Dr Kernel, a reinforcement learning approach for generating high-quality AI kernels using a new environment and methods to address reward hacking and lazy optimization, achieving competitive performance.
Contribution
It develops KernelGYM, a robust environment for RL-based kernel generation, and proposes TRLOO and profiling-based techniques to improve training stability and performance.
Findings
Dr Kernel-14B achieves performance comparable to Claude-4.5-Sonnet.
31.6% of generated kernels achieve at least 1.2x speedup.
Top candidate selection yields up to 47.8% speedup over baseline.
Abstract
High-quality kernel is critical for scalable AI systems, and enabling LLMs to generate such code would advance AI development. However, training LLMs for this task requires sufficient data, a robust environment, and the process is often vulnerable to reward hacking and lazy optimization. In these cases, models may hack training rewards and prioritize trivial correctness over meaningful speedup. In this paper, we systematically study reinforcement learning (RL) for kernel generation. We first design KernelGYM, a robust distributed GPU environment that supports reward hacking check, data collection from multi-turn interactions and long-term RL training. Building on KernelGYM, we investigate effective multi-turn RL methods and identify a biased policy gradient issue caused by self-inclusion in GRPO. To solve this, we propose Turn-level Reinforce-Leave-One-Out (TRLOO) to provide unbiased…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Reinforcement Learning in Robotics · Parallel Computing and Optimization Techniques
