Measuring Chain-of-Thought Monitorability Through Faithfulness and Verbosity
Austin Meek, Eitan Sprejer, Iv\'an Arcuschin, Austin J. Brockmeier, Steven Basart

TL;DR
This paper proposes a new measure of chain-of-thought (CoT) monitorability combining faithfulness and verbosity, enabling better assessment of model reasoning transparency and safety.
Contribution
It introduces a holistic monitorability score for CoT that captures faithfulness and verbosity, extending beyond previous proxy methods to evaluate reasoning transparency.
Findings
Models can be faithful yet hard to monitor if they omit key factors.
Monitorability varies significantly across different model families.
The proposed score effectively assesses CoT transparency and safety potential.
Abstract
Chain-of-thought (CoT) outputs let us read a model's step-by-step reasoning. Since any long, serial reasoning process must pass through this textual trace, the quality of the CoT is a direct window into what the model is thinking. This visibility could help us spot unsafe or misaligned behavior (monitorability), but only if the CoT is transparent about its internal reasoning (faithfulness). Fully measuring faithfulness is difficult, so researchers often focus on examining the CoT in cases where the model changes its answer after adding a cue to the input. This proxy finds some instances of unfaithfulness but loses information when the model maintains its answer, and does not investigate aspects of reasoning not tied to the cue. We extend these results to a more holistic sense of monitorability by introducing verbosity: whether the CoT lists every factor needed to solve the task. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPersonal Information Management and User Behavior · Topic Modeling · Advanced Software Engineering Methodologies
