Unveiling Implicit Advantage Symmetry: Why GRPO Struggles with Exploration and Difficulty Adaptation

Zhiqi Yu; Zhangquan Chen; Mengting Liu; Heye Zhang; Liangqiong Qu

arXiv:2602.05548·cs.LG·March 31, 2026

Unveiling Implicit Advantage Symmetry: Why GRPO Struggles with Exploration and Difficulty Adaptation

Zhiqi Yu, Zhangquan Chen, Mengting Liu, Heye Zhang, Liangqiong Qu

PDF

1 Repo

TL;DR

This paper identifies an implicit advantage symmetry in GRAE that hampers exploration and difficulty adaptation in RLVR, and proposes A-GRAE to improve learning efficiency and performance.

Contribution

It uncovers the limitations of symmetric advantage estimation and introduces A-GRAE, a dynamic approach that enhances exploration and difficulty focus in RLVR.

Findings

01

A-GRAE outperforms standard GRPO across seven benchmarks.

02

Asymmetry in advantage estimation promotes better exploration.

03

Curriculum-like sample difficulty shifting improves learning efficiency.

Abstract

Reinforcement Learning with Verifiable Rewards (RLVR), particularly GRPO, has become the standard for eliciting LLM reasoning. However, its efficiency in exploration and difficulty adaptation remains an open challenge. In this work, we argue that these bottlenecks stem from an implicit advantage symmetry inherent in Group Relative Advantage Estimation (GRAE). This symmetry induces two critical limitations: (i) at the group level, strict symmetry in weights between correct and incorrect trajectories leaves unsampled action logits unchanged, thereby hindering exploration of novel correct solution. (ii) at the sample level, the algorithm implicitly prioritizes medium-difficulty samples, remaining agnostic to the non-stationary demands of difficulty focus. Through controlled experiments, we reveal that this symmetric property is sub-optimal, yielding two pivotal insights: (i) asymmetrically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hku-healthai/A-GRAE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.