DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels
Haolei Bai, Lingcheng Kong, Xueyi Chen, Jianmian Wang, Zhiqiang Tao, Huan Wang

TL;DR
This paper introduces DICE, a diffusion-based large language model tailored for CUDA kernel generation, leveraging a new dataset and a bi-phase reinforcement learning framework to outperform existing models.
Contribution
The paper presents DICE, a novel diffusion LLM for CUDA kernels, along with CuKe dataset and BiC-RL training framework, advancing code generation capabilities.
Findings
DICE outperforms comparable autoregressive and diffusion models on KernelBench.
The model achieves state-of-the-art results in CUDA kernel generation.
The approach demonstrates the effectiveness of diffusion models in code synthesis.
Abstract
Diffusion large language models (dLLMs) have emerged as a compelling alternative to autoregressive (AR) LLMs, owing to their capacity for parallel token generation. This paradigm is particularly well-suited for code generation, where holistic structural planning and non-sequential refinement are critical. Despite this potential, tailoring dLLMs for CUDA kernel generation remains challenging, obstructed not only by the high specialization but also by the severe lack of high-quality training data. To address these challenges, we construct CuKe, an augmented supervised fine-tuning dataset optimized for high-performance CUDA kernels. On top of it, we propose a bi-phase curated reinforcement learning (BiC-RL) framework consisting of a CUDA kernel infilling stage and an end-to-end CUDA kernel generation stage. Leveraging this training framework, we introduce DICE, a series of diffusion large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Data Classification
