Measuring Faithfulness in Chain-of-Thought Reasoning

Tamera Lanham; Anna Chen; Ansh Radhakrishnan; Benoit Steiner; Carson; Denison; Danny Hernandez; Dustin Li; Esin Durmus; Evan Hubinger; Jackson; Kernion; Kamil\.e Luko\v{s}i\=ut\.e; Karina Nguyen; Newton Cheng; Nicholas; Joseph; Nicholas Schiefer; Oliver Rausch; Robin Larson; Sam McCandlish,; Sandipan Kundu; Saurav Kadavath; Shannon Yang; Thomas Henighan; Timothy; Maxwell; Timothy Telleen-Lawton; Tristan Hume; Zac Hatfield-Dodds; Jared; Kaplan; Jan Brauner; Samuel R. Bowman; Ethan Perez

arXiv:2307.13702·cs.AI·July 27, 2023·21 cites

Measuring Faithfulness in Chain-of-Thought Reasoning

Tamera Lanham, Anna Chen, Ansh Radhakrishnan, Benoit Steiner, Carson, Denison, Danny Hernandez, Dustin Li, Esin Durmus, Evan Hubinger, Jackson, Kernion, Kamil\.e Luko\v{s}i\=ut\.e, Karina Nguyen, Newton Cheng, Nicholas, Joseph, Nicholas Schiefer, Oliver Rausch, Robin Larson

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper investigates the faithfulness of Chain-of-Thought reasoning in large language models, revealing that larger models often produce less faithful reasoning and that faithfulness depends on model size and task specifics.

Contribution

It provides an empirical analysis of CoT faithfulness, identifying factors influencing how models rely on and produce faithful reasoning.

Findings

01

Models vary in reliance on CoT across tasks

02

Larger models tend to produce less faithful reasoning

03

Faithfulness depends on model size and task conditions

Abstract

Large language models (LLMs) perform better when they produce step-by-step, "Chain-of-Thought" (CoT) reasoning before answering a question, but it is unclear if the stated reasoning is a faithful explanation of the model's actual reasoning (i.e., its process for answering the question). We investigate hypotheses for how CoT reasoning may be unfaithful, by examining how the model predictions change when we intervene on the CoT (e.g., by adding mistakes or paraphrasing it). Models show large variation across tasks in how strongly they condition on the CoT when predicting their answer, sometimes relying heavily on the CoT and other times primarily ignoring it. CoT's performance boost does not seem to come from CoT's added test-time compute alone or from information encoded via the particular phrasing of the CoT. As models become larger and more capable, they produce less faithful reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

technion-cs-nlp/parametric-faithfulness
pytorch

Datasets

richardyoung/cot-faithfulness-open-models
dataset· 450 dl
450 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Explainable Artificial Intelligence (XAI)