The Last Word Often Wins: A Format Confound in Chain-of-Thought Corruption Studies
Gabriel Garcia

TL;DR
This paper reveals that corruption tests in chain-of-thought reasoning often measure answer placement rather than actual reasoning steps, due to the influence of explicit answer text location.
Contribution
It demonstrates that answer placement confounds corruption-based faithfulness evaluations and proposes a protocol to mitigate this issue.
Findings
Corruption sensitivity largely depends on explicit answer position.
Conflicting-answer prompts drastically reduce accuracy at smaller model scales.
Final answers are rarely determined early during generation, indicating a readout effect.
Abstract
Corruption studies, the standard tool for evaluating chain-of-thought (CoT) faithfulness, infer which steps are ``computationally important'' from accuracy loss when steps are corrupted. We show that when benchmark chains end with an explicit terminal answer line, as in GSM8K and MATH, these tests largely measure \emph{answer placement} rather than where intermediate computation is carried out. Using matched GSM8K examples, removing only the final answer statement while preserving all reasoning collapses suffix sensitivity by about for Qwen~2.5-3B (, ). Conflicting-answer prompts, which contain correct reasoning but a wrong explicit final answer, drive accuracy to zero or near-zero at 7B across five open-weight model families; wrong-answer following is strong at 3B--7B and attenuates sharply at larger scales. Replications on MATH, within-stable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
