Leveraging Error Diversity in Group Rollouts for Reinforcement Learning
Wenpu Liu,Yuqi Xu,Weichu Xie,Yongfu Zhu,Shuai Dong,Ziyue Wang,Wenqi Shao,Xiaoying Zhang,Tong Yang, Nan Duan,Jiaqi Wang

TL;DR
This paper introduces Error Diversity Advantage Shaping (EDAS), a simple technique that improves reinforcement learning from verifiable rewards by leveraging error diversity in group responses, leading to consistent performance gains.
Contribution
The paper proposes EDAS, a novel, model-agnostic method that enhances RLVR by modulating advantage signals based on intra-group error diversity, improving training outcomes.
Findings
EDAS improves performance across multiple models and benchmarks.
Error diversity within group responses predicts training success.
EDAS yields an average of 6.29 points improvement over DAPO.
Abstract
Reinforcement Learning from Verifiable Rewards (RLVR) typically samples multiple responses per prompt and assigns binary rewards based on individual correctness, yet the collective structure of the group output, specifically the distribution of errors, is largely discarded. We identify this as a missed opportunity: empirical analysis reveals that error diversity within a group is a strong predictor of training success, with problems eliciting diverse wrong answers benefiting substantially more from RLVR than those producing homogeneous failures. Motivated by this observation, we propose Error Diversity Advantage Shaping (EDAS), a lightweight, algorithm-agnostic technique that modulates the advantage signal for incorrect rollouts based on intra-group error diversity. EDAS amplifies penalties for dominant, repeated errors and attenuates penalties for rare, exploratory ones, thereby…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
