Exploring Pass-Rate Reward in Reinforcement Learning for Code Generation
Xin-Ye Li, Ren-Biao Liu, Yun-Ji Zhang, Hui Sun, Zheng Xie, Ming Li

TL;DR
This paper investigates the effectiveness of pass-rate rewards in critic-free reinforcement learning for code generation, finding they do not reliably improve performance despite being less sparse than binary rewards.
Contribution
The study provides a detailed analysis of pass-rate rewards, revealing their limitations and proposing the need for better reward designs aligned with full correctness.
Findings
Pass-rate rewards are denser but do not consistently improve final performance.
Gradient directions from pass-rate rewards can conflict, hindering progress.
Reward calibration is crucial for effective reinforcement learning in code generation.
Abstract
Reinforcement learning (RL) from unit-test feedback has become a standard post-training recipe for improving large language models (LLMs) on code generation. However, the pass-all-tests binary reward can be sparse, yielding no learning signal on challenging problems where none of the sampled solutions passes all tests. A common remedy is to use the test-case pass rate as a surrogate reward. In this work, we study pass-rate rewards in critic-free RL for code generation (e.g., GRPO and RLOO) and report a consistent pattern across base models and algorithms: despite alleviating reward sparsity, pass-rate rewards do not reliably improve final performance over binary rewards in rigorous controlled experiments. To understand this discrepancy, we analyze reward density and the resulting gradient directions. We find that pass-rate rewards are denser, but the induced gradient updates do not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
