Bypassing the Rationale: Causal Auditing of Implicit Reasoning in Language Models
Anish Sathyanarayanan, Aditya Nagarsekar, Aarush Rathore

TL;DR
This paper introduces a causal layerwise auditing method for chain-of-thought prompting in language models, revealing that reasoning influence is often localized and can be bypassed, challenging assumptions about transparency.
Contribution
It presents a novel activation patching approach and the CoT Mediation Index to measure causal influence of reasoning, uncovering variability across models and tasks.
Findings
CoT influence is often localized in narrow reasoning windows.
Models tuned for reasoning show stronger, more structured mediation.
Bypass regimes exist where reasoning text has minimal causal impact.
Abstract
Chain-of-thought (CoT) prompting is widely used as a reasoning aid and is often treated as a transparency mechanism. Yet behavioral gains under CoT do not imply that the model's internal computation causally depends on the emitted reasoning text, i.e., models may produce fluent rationales while routing decision-critical computation through latent pathways. We introduce a causal, layerwise audit of CoT faithfulness based on activation patching. Our key metric, the CoT Mediation Index (CMI), isolates CoT-specific causal influence by comparing performance degradation from patching CoT-token hidden states against matched control patches. Across multiple model families (Phi, Qwen, DialoGPT) and scales, we find that CoT-specific influence is typically depth-localized into narrow "reasoning windows," and we identify bypass regimes where CMI is near-zero despite plausible CoT text. We further…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Embodied and Extended Cognition
