Dissociation of Faithful and Unfaithful Reasoning in LLMs
Evelyn Yee, Alice Li, Chenyu Tang, Yeon Ho Jung, Ramamohan, Paturi, Leon Bergen

TL;DR
This paper investigates how large language models recover from errors in their reasoning processes, revealing unfaithfulness where models produce correct answers despite invalid reasoning, and explores factors influencing this behavior.
Contribution
It uncovers the mechanisms behind faithful and unfaithful error recovery in LLMs and suggests targeted interventions to reduce unfaithful reasoning and enhance interpretability.
Findings
Models recover more from obvious errors.
More evidence in context improves recovery.
Faithful and unfaithful recoveries are driven by different mechanisms.
Abstract
Large language models (LLMs) often improve their performance in downstream tasks when they generate Chain of Thought reasoning text before producing an answer. We investigate how LLMs recover from errors in Chain of Thought. Through analysis of error recovery behaviors, we find evidence for unfaithfulness in Chain of Thought, which occurs when models arrive at the correct answer despite invalid reasoning text. We identify factors that shift LLM recovery behavior: LLMs recover more frequently from obvious errors and in contexts that provide more evidence for the correct answer. Critically, these factors have divergent effects on faithful and unfaithful recoveries. Our results indicate that there are distinct mechanisms driving faithful and unfaithful error recoveries. Selective targeting of these mechanisms may be able to drive down the rate of unfaithful reasoning and improve model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Business Law and Ethics · Dispute Resolution and Class Actions
