On the Hidden Objective Biases of Group-based Reinforcement Learning

Aleksandar Fontana; Marco Simoni; Giulio Rossolini; Andrea Saracino; Paolo Mori

arXiv:2601.05002·cs.LG·January 9, 2026

On the Hidden Objective Biases of Group-based Reinforcement Learning

Aleksandar Fontana, Marco Simoni, Giulio Rossolini, Andrea Saracino, Paolo Mori

PDF

Open Access

TL;DR

This paper provides a theoretical analysis of group-based reinforcement learning methods like GRPO, revealing inherent biases and limitations that affect training dynamics and policy optimization.

Contribution

It introduces a unified surrogate formulation to analyze GRPO methods, uncovering systematic biases and interactions with optimizers that impact training.

Findings

01

Non-uniform group weighting causes gradient biases.

02

Interactions with AdamW reduce sensitivity to reward scaling.

03

Optimizer momentum can lead to policy updates beyond intended clipping.

Abstract

Group-based reinforcement learning methods, like Group Relative Policy Optimization (GRPO), are widely used nowadays to post-train large language models. Despite their empirical success, they exhibit structural mismatches between reward optimization and the underlying training objective. In this paper, we present a theoretical analysis of GRPO style methods by studying them within a unified surrogate formulation. This perspective reveals recurring properties that affect all the methods under analysis: (i) non-uniform group weighting induces systematic gradient biases on shared prefix tokens; (ii) interactions with the AdamW optimizer make training dynamics largely insensitive to reward scaling; and (iii) optimizer momentum can push policy updates beyond the intended clipping region under repeated optimization steps. We believe that these findings highlight fundamental limitations of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Natural Language Processing Techniques · Topic Modeling