On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and   Latent Concept

Guangliang Liu; Haitao Mao; Bochuan Cao; Zhiyu Xue; Xitong Zhang,; Rongrong Wang; Jiliang Tang; Kristen Johnson

arXiv:2406.02378·cs.CL·November 11, 2024·1 cites

On the Intrinsic Self-Correction Capability of LLMs: Uncertainty and Latent Concept

Guangliang Liu, Haitao Mao, Bochuan Cao, Zhiyu Xue, Xitong Zhang,, Rongrong Wang, Jiliang Tang, Kristen Johnson

PDF

Open Access

TL;DR

This paper investigates the intrinsic self-correction ability of Large Language Models, showing it can be improved through iterative interactions, leading to stable performance by reducing model uncertainty and activating latent concepts.

Contribution

It introduces a mathematical framework and simulation to explain how self-correction converges by reducing uncertainty and activating latent concepts in LLMs.

Findings

01

Intrinsic self-correction improves with iterations in multi-round QA.

02

Iterative instructions reduce model uncertainty and calibration error.

03

Self-correction converges to stable performance through uncertainty reduction.

Abstract

Large Language Models (LLMs) are able to improve their responses when instructed to do so, a capability known as self-correction. When instructions provide only the task's goal without specific details about potential issues in the response, LLMs must rely on their internal knowledge to improve response quality, a process referred to as intrinsic self-correction. The empirical success of intrinsic self-correction is evident in various applications, but how and why it is effective remains unknown. In this paper, we unveil that intrinsic self-correction can be progressively improved, allowing it to approach a converged state. Our findings are verified in: (1) the scenario of multi-round question answering, by comprehensively demonstrating that intrinsic self-correction can progressively introduce performance gains through iterative interactions, ultimately converging to stable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems