AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback

Miaobo Hu; Shuhao Hu; Bokun Wang; Ruohan Wang; Xin Wang; Xiaobo Guo; Daren Zha; Jun Xiao

arXiv:2605.20722·cs.LG·May 21, 2026

AGPO: Adaptive Group Policy Optimization with Dual Statistical Feedback

Miaobo Hu, Shuhao Hu, Bokun Wang, Ruohan Wang, Xin Wang, Xiaobo Guo, Daren Zha, Jun Xiao

PDF

1 Repo

TL;DR

AGPO introduces an adaptive, critic-free reinforcement learning method that dynamically adjusts training parameters based on statistical feedback, improving large language model reasoning across multiple benchmarks.

Contribution

It presents a novel adaptive group policy optimization technique that enhances training stability and performance without critic networks, outperforming traditional methods on various benchmarks.

Findings

01

AGPO outperforms PPO/GRPO on nine benchmarks, including GSM8K and MATH.

02

Gains transfer to other models like Llama-3-8B and Gemma-2-9B.

03

Ablation studies confirm the effectiveness of both modules.

Abstract

Reinforcement learning improves LLM reasoning, but PPO/GRPO typically use fixed clipping and decoding temperature, which makes training brittle and tuning-heavy. We propose Adaptive Group Policy Optimization (AGPO), a critic-free refinement of GRPO that uses group-level statistics to control both update magnitude and exploration. AGPO uses a shared probe-derived statistical state to drive two controllers: (i) adaptive clipping, which sets the trust-region size from reward dispersion and skewness, probe vote entropy, policy entropy, and step-wise KL drift; and (ii) bidirectional adaptive temperature sampling, which heats or cools decoding around a base temperature according to centered uncertainty relative to a running baseline. On nine English and Chinese math/STEM benchmarks, Qwen2.5-14B trained with AGPO outperforms PPO/GRPO under the same generated-token budget, reaching 67.3% on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wandugu/paper_agpo
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.