RAGEN-2: Reasoning Collapse in Agentic RL

Zihan Wang; Chi Gui; Xing Jin; Qineng Wang; Licheng Liu; Kangrui Wang; Shiqi Chen; Linjie Li; Zhengyuan Yang; Pingyue Zhang; Yiping Lu; Jiajun Wu; Li Fei-Fei; Lijuan Wang; Yejin Choi; Manling Li

arXiv:2604.06268·cs.LG·April 9, 2026

RAGEN-2: Reasoning Collapse in Agentic RL

Zihan Wang, Chi Gui, Xing Jin, Qineng Wang, Licheng Liu, Kangrui Wang, Shiqi Chen, Linjie Li, Zhengyuan Yang, Pingyue Zhang, Yiping Lu, Jiajun Wu, Li Fei-Fei, Lijuan Wang, Yejin Choi, Manling Li

PDF

1 Repo 1 Datasets

TL;DR

This paper identifies a failure mode in RL-trained multi-turn LLM agents called template collapse, where models rely on input-agnostic templates, and proposes mutual information diagnostics and SNR-aware filtering to improve reasoning stability and task performance.

Contribution

It introduces a mutual information-based diagnostic for reasoning collapse and proposes SNR-aware filtering to enhance reasoning diversity and task success.

Findings

01

Mutual information correlates more strongly with performance than entropy.

02

Template collapse can occur even with stable entropy measures.

03

SNR-aware filtering improves input dependence and task performance across multiple tasks.

Abstract

RL training of multi-turn LLM agents is inherently unstable, and reasoning quality directly determines task performance. Entropy is widely used to track reasoning stability. However, entropy only measures diversity within the same input, and cannot tell whether reasoning actually responds to different inputs. In RAGEN-2, we find that even with stable entropy, models can rely on fixed templates that look diverse but are input-agnostic. We call this template collapse, a failure mode invisible to entropy and all existing metrics. To diagnose this failure, we decompose reasoning quality into within-input diversity (Entropy) and cross-input distinguishability (Mutual Information, MI), and introduce a family of mutual information proxies for online diagnosis. Across diverse tasks, mutual information correlates with final performance much more strongly than entropy, making it a more reliable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mll-lab-nu/RAGEN
github

Datasets

SeanWang0027/RAGEN
dataset· 663 dl
663 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.