Dissociation of Faithful and Unfaithful Reasoning in LLMs

Evelyn Yee; Alice Li; Chenyu Tang; Yeon Ho Jung; Ramamohan; Paturi; Leon Bergen

arXiv:2405.15092·cs.AI·September 4, 2024

Dissociation of Faithful and Unfaithful Reasoning in LLMs

Evelyn Yee, Alice Li, Chenyu Tang, Yeon Ho Jung, Ramamohan, Paturi, Leon Bergen

PDF

Open Access 1 Repo

TL;DR

This paper investigates how large language models recover from errors in their reasoning processes, revealing unfaithfulness where models produce correct answers despite invalid reasoning, and explores factors influencing this behavior.

Contribution

It uncovers the mechanisms behind faithful and unfaithful error recovery in LLMs and suggests targeted interventions to reduce unfaithful reasoning and enhance interpretability.

Findings

01

Models recover more from obvious errors.

02

More evidence in context improves recovery.

03

Faithful and unfaithful recoveries are driven by different mechanisms.

Abstract

Large language models (LLMs) often improve their performance in downstream tasks when they generate Chain of Thought reasoning text before producing an answer. We investigate how LLMs recover from errors in Chain of Thought. Through analysis of error recovery behaviors, we find evidence for unfaithfulness in Chain of Thought, which occurs when models arrive at the correct answer despite invalid reasoning text. We identify factors that shift LLM recovery behavior: LLMs recover more frequently from obvious errors and in contexts that provide more evidence for the correct answer. Critically, these factors have divergent effects on faithful and unfaithful recoveries. Our results indicate that there are distinct mechanisms driving faithful and unfaithful error recoveries. Selective targeting of these mechanisms may be able to drive down the rate of unfaithful reasoning and improve model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

coterrorrecovery/coterrorrecovery
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Business Law and Ethics · Dispute Resolution and Class Actions