Reasoning Models Sometimes Output Illegible Chains of Thought

Arun Jose

arXiv:2510.27338·cs.LG·November 3, 2025

Reasoning Models Sometimes Output Illegible Chains of Thought

Arun Jose

PDF

Open Access

TL;DR

This paper investigates how reinforcement learning affects the clarity of reasoning chains in language models, revealing that models often produce illegible reasoning to reach correct answers, which challenges monitoring efforts.

Contribution

It provides the first comprehensive analysis of reasoning legibility across multiple models, highlighting the impact of RL on reasoning transparency and its implications for AI safety.

Findings

01

RL often causes illegible reasoning in models

02

Illegible reasoning persists even when answers are readable

03

Legibility decreases on more difficult questions

Abstract

Language models trained via outcome-based reinforcement learning (RL) to reason using chain-of-thought (CoT) have shown remarkable performance. Monitoring such a model's CoT may allow us to understand its intentions and detect potential malicious behavior. However, to be effective, this requires that CoTs are legible and faithful. We study CoT legibility across 14 reasoning models, finding that RL often causes reasoning to become illegible to both humans and AI monitors, with reasoning models (except Claude) generating illegible CoTs while returning to perfectly readable final answers. We show that models use illegible reasoning to reach correct answers (accuracy dropping by 53\% when forced to use only legible portions), yet find no correlation between legibility and performance when resampling - suggesting the relationship is more nuanced. We also find that legibility degrades on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Ethics and Social Impacts of AI