GAPO: Robust Advantage Estimation for Real-World Code LLMs
Jianqing Zhang, Zhezheng Hao, Wei Xia, Hande Dong, Hong Wang, Chenxing Wei, Yuyan Zhou, Yubin Qi, Qiang Lin, Jian Cao

TL;DR
GAPO introduces an adaptive advantage estimation method for code LLMs that improves robustness against noisy reward signals in real-world scenarios, leading to better performance and efficiency.
Contribution
The paper proposes GAPO, a novel adaptive advantage estimation technique that enhances robustness and efficiency in RL fine-tuning of code LLMs under noisy reward conditions.
Findings
Up to 4.35 in-domain exact-match improvement
Up to 5.30 out-of-domain exact-match improvement
Lower clipping ratios and higher GPU throughput
Abstract
Reinforcement learning (RL) is widely used for post-training large language models (LLMs) in code editing, where group-relative methods, such as GRPO, are popular due to their critic-free and normalized advantage estimation. However, in real-world code-editing scenarios, reward distributions are often skewed with unpredictable noise, leading to distorted advantage computation and increased rollout outliers. To address this issue, we propose Group Adaptive Policy Optimization (GAPO), which adaptively finds an interval with the highest SNR (Signal to Noise Ratio) per prompt and uses the median of that interval as an adaptive Q to replace the group mean in advantage calculation to reduce noise further. This adaptive Q robustly handles rollout noise while remaining plug-and-play and efficient. We evaluate GAPO on nine instruction-tuned LLMs (3B-14B) using a collected large dataset of 51,844…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Software Testing and Debugging Techniques · Natural Language Processing Techniques
