Monitorability as a Free Gift: How RLVR Spontaneously Aligns Reasoning

Zidi Xiong; Shan Chen; Himabindu Lakkaraju

arXiv:2602.03978·cs.AI·February 5, 2026

Monitorability as a Free Gift: How RLVR Spontaneously Aligns Reasoning

Zidi Xiong, Shan Chen, Himabindu Lakkaraju

PDF

Open Access

TL;DR

This paper investigates how monitorability, the reflection of internal reasoning in chain-of-thought traces, can spontaneously improve during RLVR training, influenced by data diversity and training dynamics, but not necessarily linked to reasoning performance.

Contribution

It provides a systematic evaluation of monitorability emergence in RLVR, highlighting data dependence, its independence from reasoning capability, and mechanistic insights into its underlying factors.

Findings

01

Monitorability improvements are data-dependent.

02

Data diversity and instruction-following data are critical.

03

Monitorability is orthogonal to reasoning performance.

Abstract

As Large Reasoning Models (LRMs) are increasingly deployed, auditing their chain-of-thought (CoT) traces for safety becomes critical. Recent work has reported that monitorability--the degree to which CoT faithfully and informatively reflects internal computation--can appear as a "free gift" during the early stages of Reinforcement Learning with Verifiable Rewards (RLVR). We make this observation concrete through a systematic evaluation across model families and training domains. Our results show that this effect is not universal: monitorability improvements are strongly data-dependent. In particular, we demonstrate the critical role of data diversity and instruction-following data during RLVR training. We further show that monitorability is orthogonal to capability--improvements in reasoning performance do not imply increased transparency. Through mechanistic analysis, we attribute…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI · Adversarial Robustness in Machine Learning