Is Chain-of-Thought Really Not Explainability? Chain-of-Thought Can Be Faithful without Hint Verbalization

Kerem Zaman; Shashank Srivastava

arXiv:2512.23032·cs.CL·May 11, 2026

Is Chain-of-Thought Really Not Explainability? Chain-of-Thought Can Be Faithful without Hint Verbalization

Kerem Zaman, Shashank Srivastava

PDF

TL;DR

This paper challenges the notion that chain-of-thought explanations are unfaithful if they omit hints, showing that many are actually faithful and that evaluation metrics should be broader.

Contribution

It introduces a new faithful@k metric, applies causal mediation analysis, and argues for a broader interpretability toolkit beyond hint-based evaluations.

Findings

01

Larger inference budgets increase hint verbalization up to 90%.

02

Many CoTs flagged as unfaithful are actually faithful according to other metrics.

03

Hint omission alone does not prove unfaithfulness.

Abstract

Recent work, using the Biasing Features metric, labels a CoT as unfaithful if it omits a prompt-injected hint that affected the prediction. We argue this metric adopts a narrow notion of faithfulness and confuses unfaithfulness with incompleteness, the lossy compression needed to turn distributed transformer computation into a linear natural language narrative. On multi-hop reasoning tasks with instruct-tuned and reasoning models, many CoTs flagged as unfaithful by Biasing Features are judged faithful by other metrics, exceeding 50% in some models. With a new faithful@k metric, we show that larger inference-time budgets greatly increase hint verbalization (up to 90% in some settings), suggesting much apparent unfaithfulness is due to tight token limits. Using Causal Mediation Analysis, we further show that even non-verbalized hints can causally mediate prediction changes through the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.