Chain-of-Thought Unfaithfulness as Disguised Accuracy

Oliver Bentham; Nathan Stringham; Ana Marasovi\'c

arXiv:2402.14897·cs.CL·June 24, 2024·2 cites

Chain-of-Thought Unfaithfulness as Disguised Accuracy

Oliver Bentham, Nathan Stringham, Ana Marasovi\'c

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates the reliability of a metric for measuring how well Chain-of-Thought (CoT) generations reflect a model's internal reasoning, revealing that normalized faithfulness correlates with accuracy and may not be a valid measure of true faithfulness.

Contribution

The study replicates previous scaling experiments, normalizes the faithfulness metric, and questions its validity by showing its strong correlation with accuracy.

Findings

01

Normalized faithfulness drops for smaller models

02

Strong correlation ($R^2$=0.74) between normalized faithfulness and accuracy

03

Scaling trends are reproducible under specific conditions

Abstract

Understanding the extent to which Chain-of-Thought (CoT) generations align with a large language model's (LLM) internal computations is critical for deciding whether to trust an LLM's output. As a proxy for CoT faithfulness, Lanham et al. (2023) propose a metric that measures a model's dependence on its CoT for producing an answer. Within a single family of proprietary models, they find that LLMs exhibit a scaling-then-inverse-scaling relationship between model size and their measure of faithfulness, and that a 13 billion parameter model exhibits increased faithfulness compared to models ranging from 810 million to 175 billion parameters in size. We evaluate whether these results generalize as a property of all LLMs. We replicate the experimental setup in their section focused on scaling experiments with three different families of models and, under specific conditions, successfully…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

utahnlp/cot_disguised_accuracy
pytorchOfficial

Videos

Chain-of-Thought Unfaithfulness as Disguised Accuracy· slideslive

Taxonomy

TopicsPsychology of Moral and Emotional Judgment · Epistemology, Ethics, and Metaphysics

MethodsALIGN