Measuring Faithfulness in Chain-of-Thought Reasoning
Tamera Lanham, Anna Chen, Ansh Radhakrishnan, Benoit Steiner, Carson, Denison, Danny Hernandez, Dustin Li, Esin Durmus, Evan Hubinger, Jackson, Kernion, Kamil\.e Luko\v{s}i\=ut\.e, Karina Nguyen, Newton Cheng, Nicholas, Joseph, Nicholas Schiefer, Oliver Rausch, Robin Larson

TL;DR
This paper investigates the faithfulness of Chain-of-Thought reasoning in large language models, revealing that larger models often produce less faithful reasoning and that faithfulness depends on model size and task specifics.
Contribution
It provides an empirical analysis of CoT faithfulness, identifying factors influencing how models rely on and produce faithful reasoning.
Findings
Models vary in reliance on CoT across tasks
Larger models tend to produce less faithful reasoning
Faithfulness depends on model size and task conditions
Abstract
Large language models (LLMs) perform better when they produce step-by-step, "Chain-of-Thought" (CoT) reasoning before answering a question, but it is unclear if the stated reasoning is a faithful explanation of the model's actual reasoning (i.e., its process for answering the question). We investigate hypotheses for how CoT reasoning may be unfaithful, by examining how the model predictions change when we intervene on the CoT (e.g., by adding mistakes or paraphrasing it). Models show large variation across tasks in how strongly they condition on the CoT when predicting their answer, sometimes relying heavily on the CoT and other times primarily ignoring it. CoT's performance boost does not seem to come from CoT's added test-time compute alone or from information encoded via the particular phrasing of the CoT. As models become larger and more capable, they produce less faithful reasoning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Advanced Graph Neural Networks · Explainable Artificial Intelligence (XAI)
