On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language   Models

Sree Harsha Tanneru; Dan Ley; Chirag Agarwal; Himabindu Lakkaraju

arXiv:2406.10625·cs.CL·July 2, 2024·1 cites

On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language Models

Sree Harsha Tanneru, Dan Ley, Chirag Agarwal, Himabindu Lakkaraju

PDF

Open Access

TL;DR

This paper investigates the difficulty of ensuring that Large Language Models' Chain-of-Thought reasoning accurately reflects their true behavior, finding that current methods offer only limited improvements across various benchmarks.

Contribution

The study introduces novel strategies for in-context learning, fine-tuning, and activation editing aimed at improving CoT faithfulness, and provides extensive empirical analysis of their effectiveness.

Findings

01

Limited success of strategies in improving faithfulness

02

Marginal improvements from fine-tuning and in-context learning

03

Activation editing showed minimal impact

Abstract

As Large Language Models (LLMs) are increasingly being employed in real-world applications in critical domains such as healthcare, it is important to ensure that the Chain-of-Thought (CoT) reasoning generated by these models faithfully captures their underlying behavior. While LLMs are known to generate CoT reasoning that is appealing to humans, prior studies have shown that these explanations do not accurately reflect the actual behavior of the underlying LLMs. In this work, we explore the promise of three broad approaches commonly employed to steer the behavior of LLMs to enhance the faithfulness of the CoT reasoning generated by LLMs: in-context learning, fine-tuning, and activation editing. Specifically, we introduce novel strategies for in-context learning, fine-tuning, and activation editing aimed at improving the faithfulness of the CoT reasoning. We then carry out extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Advanced Graph Neural Networks · Natural Language Processing Techniques