Mechanistic Evidence for Faithfulness Decay in Chain-of-Thought Reasoning
Donald Ye, Max Loffgren, Om Kotadia, Linus Wong

TL;DR
This paper introduces NLDD, a metric to assess whether language models' step-by-step explanations truly reflect their decision process, revealing a reasoning horizon beyond which explanations no longer influence answers.
Contribution
The paper proposes NLDD, a novel metric for evaluating the faithfulness of chain-of-thought explanations and uncovers a consistent reasoning horizon across models and tasks.
Findings
Models have a reasoning horizon at 70-85% of chain length.
Models can encode correct reasoning internally but still fail at tasks.
Accuracy alone does not indicate genuine reasoning in models.
Abstract
Chain-of-Thought (CoT) explanations are widely used to interpret how language models solve complex problems, yet it remains unclear whether these step-by-step explanations reflect how the model actually reaches its answer, or merely post-hoc justifications. We propose Normalized Logit Difference Decay (NLDD), a metric that measures whether individual reasoning steps are faithful to the model's decision-making process. Our approach corrupts individual reasoning steps from the explanation and measures how much the model's confidence in its answer drops, to determine if a step is truly important. By standardizing these measurements, NLDD enables rigorous cross-model comparison across different architectures. Testing three model families across syntactic, logical, and arithmetic tasks, we discover a consistent Reasoning Horizon (k*) at 70--85% of chain length, beyond which reasoning tokens…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbodied and Extended Cognition · Topic Modeling · Multimodal Machine Learning Applications
