Beyond Execution: Static-Analysis Rewards and Hint-Conditioned Diffusion RL for Code Generation
Shuyin Ouyang, Zhaozhi Qian, Faroq AL-Tam, Muhammad AL-Qurishi, Jie M. Zhang

TL;DR
This paper systematically studies reinforcement learning strategies for diffusion-based code generation, emphasizing reward design, hint conditioning, and task difficulty, with static checking emerging as a strong execution-free reward.
Contribution
It provides empirical insights into how reward design and hint conditioning influence diffusion RL performance across varying task difficulties.
Findings
Static checking is the most effective execution-free reward, significantly improving code generation metrics.
Moderate AST-based hinting benefits harder benchmarks, aiding exploration.
Reward effectiveness varies with task difficulty, with static checking excelling on challenging tasks.
Abstract
Reinforcement Learning (RL) is an important paradigm for aligning Diffusion Language Models (DLMs) toward functional correctness in code generation. However, these models often encounter a ``capability cliff'' on complex tasks, where execution-based semantic rewards become too low to provide a viable learning signal. In this paper, we present a systematic empirical study of RL post-training for diffusion-based code generation along three axes: reward design, hint-conditioned sampling, and task difficulty. We investigate the effectiveness of execution-free rewards as alternatives to traditional unit-test execution, the role of training-time hint-conditioned diffusion sampling in mitigating exploration bottlenecks, and the impact of these design choices varies across tasks with different difficulty levels. Across HumanEval, MBPP, and LiveCodeBench, we find that static checking is the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
