Mechanistic Evidence for Faithfulness Decay in Chain-of-Thought Reasoning

Donald Ye; Max Loffgren; Om Kotadia; Linus Wong

arXiv:2602.11201·cs.CL·February 13, 2026

Mechanistic Evidence for Faithfulness Decay in Chain-of-Thought Reasoning

Donald Ye, Max Loffgren, Om Kotadia, Linus Wong

PDF

Open Access

TL;DR

This paper introduces NLDD, a metric to assess whether language models' step-by-step explanations truly reflect their decision process, revealing a reasoning horizon beyond which explanations no longer influence answers.

Contribution

The paper proposes NLDD, a novel metric for evaluating the faithfulness of chain-of-thought explanations and uncovers a consistent reasoning horizon across models and tasks.

Findings

01

Models have a reasoning horizon at 70-85% of chain length.

02

Models can encode correct reasoning internally but still fail at tasks.

03

Accuracy alone does not indicate genuine reasoning in models.

Abstract

Chain-of-Thought (CoT) explanations are widely used to interpret how language models solve complex problems, yet it remains unclear whether these step-by-step explanations reflect how the model actually reaches its answer, or merely post-hoc justifications. We propose Normalized Logit Difference Decay (NLDD), a metric that measures whether individual reasoning steps are faithful to the model's decision-making process. Our approach corrupts individual reasoning steps from the explanation and measures how much the model's confidence in its answer drops, to determine if a step is truly important. By standardizing these measurements, NLDD enables rigorous cross-model comparison across different architectures. Testing three model families across syntactic, logical, and arithmetic tasks, we discover a consistent Reasoning Horizon (k*) at 70--85% of chain length, beyond which reasoning tokens…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmbodied and Extended Cognition · Topic Modeling · Multimodal Machine Learning Applications