On Group Relative Policy Optimization Collapse in Agent Search: The Lazy Likelihood-Displacement

Wenlong Deng; Yushu Li; Boying Gong; Yi Ren; Christos Thrampoulidis; Xiaoxiao Li

arXiv:2512.04220·cs.CL·February 3, 2026

On Group Relative Policy Optimization Collapse in Agent Search: The Lazy Likelihood-Displacement

Wenlong Deng, Yushu Li, Boying Gong, Yi Ren, Christos Thrampoulidis, Xiaoxiao Li

PDF

Open Access 10 Models

TL;DR

This paper identifies a core failure mode called Lazy Likelihood Displacement in Group Relative Policy Optimization for tool-integrated RL, and proposes a regularization method to stabilize training and improve performance.

Contribution

It uncovers Lazy Likelihood Displacement as a key cause of collapse in GRPO and introduces a likelihood-preserving regularization to mitigate this issue.

Findings

01

LLD causes early stagnation and collapse in training.

02

The proposed LLDS regularization stabilizes training and prevents gradient explosion.

03

Significant performance improvements on multiple benchmarks.

Abstract

Tool-integrated (TI) reinforcement learning (RL) enables large language models (LLMs) to perform multi-step reasoning by interacting with external tools such as search engines and retrievers. Group Relative Policy Optimization (GRPO), exemplified by the recent Search-R1, offers fast convergence and a value-free formulation that makes it appealing for this setting, yet consistently suffers from training collapse. We identify Lazy Likelihood Displacement (LLD), a systematic reduction or stagnation in the likelihood of both correct and incorrect responses, as the core mechanism driving this failure. LLD emerges early and triggers a self-reinforcing LLD Death Spiral, where declining likelihood leads to low-confidence responses, inflating gradients, and ultimately causing collapse. We empirically characterize this process across models on a Search-R1-style, search-integrated question…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics