When Reasoning Traces Become Performative: Step-Level Evidence that Chain-of-Thought Is an Imperfect Oversight Channel
Wenkai Li, Fan Yang, Ananya Hazarika, Shaunak A. Mehta, Koichi Onoue

TL;DR
This paper investigates the alignment between reasoning traces and actual answer formation in language models, revealing frequent mismatches and confabulations that challenge the assumption of trace fidelity.
Contribution
It introduces a step-level detection framework to analyze the synchronization of reasoning traces with answer commitment, uncovering prevalent confabulation patterns across models.
Findings
Latent commitment and answer arrival align only 61.9% of steps on average.
58.0% of mismatch events occur after answer stabilization, indicating confabulation.
Lower step-level alignment correlates with higher utility of chain-of-thought reasoning.
Abstract
Chain-of-thought (CoT) traces are increasingly used both to improve language model capability and to audit model behavior, implicitly assuming that the visible trace remains synchronized with the computation that determines the answer. We test this assumption with a step-level Detect-Classify-Compare framework built around an answer-commitment proxy that is cross-validated with Patchscopes, tuned-lens probes, and causal direction ablation. Across nine models and seven reasoning benchmarks, latent commitment and explicit answer arrival align on only 61.9% of steps on average. The dominant mismatch pattern is confabulated continuation: 58.0% of detected mismatch events occur after the answer-commitment proxy has already stabilized while the trace continues producing deliberative-looking text, and a vacuousness analysis shows that the committed answer does not change during these steps. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
