Large Language Models have Intrinsic Self-Correction Ability
Dancheng Liu, Amir Nassereldine, Ziming Yang, Chenhui Xu, Yuting Hu,, Jiajie Li, Utkarsh Kumar, Changjae Lee, Ruiyang Qin, Yiyu Shi, Jinjun Xiong

TL;DR
This paper investigates the intrinsic self-correction ability of large language models, showing that with proper settings like zero temperature and fair prompts, LLMs can effectively revise their answers without external knowledge.
Contribution
The paper provides a theoretical and empirical analysis demonstrating intrinsic self-correction in LLMs and highlights key factors like zero temperature and fair prompts for success.
Findings
Intrinsic self-correction exists across multiple LLMs.
Zero temperature and fair prompts are critical for effective self-correction.
Insights into fundamental theories of LLM self-correction behavior.
Abstract
Large language models (LLMs) have attracted significant attention for their exceptional abilities in various natural language processing tasks, but they suffer from hallucinations that will cause performance degradation. One promising solution to improve the LLMs' performance is to ask LLMs to revise their answer after generation, a technique known as self-correction. Among the two types of self-correction, intrinsic self-correction is considered a promising direction because it does not utilize external knowledge. However, recent works doubt the validity of LLM's ability to conduct intrinsic self-correction. In this paper, we present a novel perspective on the intrinsic self-correction capabilities of LLMs through theoretical analyses and empirical experiments. In addition, we identify two critical factors for successful self-correction: zero temperature and fair prompts. Leveraging…
Peer Reviews
Decision·Submitted to ICLR 2025
Solid theoretical argument that runs throughout the paper. Though this isn’t my core area it was quite easy to understand and most claims were plausible. The experiments are well connected to the theoretical analysis. Neither part feels like an afterthought, which is a bit rare for LLM reasoning papers :) The discovery of best practices for prompt sets can both be directly applied in engineering work and used to inspire future analysis.
Few, mainly the more philosophical claims I complained about in my summary.
May be interesting to future research.
1. The paper is very difficult to read. In particular, the equation part. Proof 2.1 is very confusing and many explanations are not given, and I cannot understand it with 2 or 3 times reading. For example, in Line 869, "which we denote as correct(A ∈ Q) = λ > k1". What is λ? The portion of correctly answered questions/all questions? And I do not think the probability of that hallucination randomly changing the answer is equal: tokens are of different importance to a sentence (answer), thus, som
1. **Insightful Analysis of Self-Correction Mechanisms**: The paper offers a novel perspective by comparing intrinsic self-correction to chain-of-thought and self-verification techniques. This theoretical framing provides a solid foundation for understanding the mechanisms that enable LLMs to self-correct. 2. **Practical Recommendations for Enhanced Model Performance**: By identifying unbiased prompts and zero temperature as key factors for effective self-correction, the authors present valuable
1. **Limited Performance Improvement Even in Ideal Conditions**: The performance gain from self-correction is relatively small, even under the ideal settings proposed (unbiased prompts and zero temperature). As is shown in Table 1 & 2, under many circumstances, the performance gain is less than 2%. This marginal improvement raises questions about the practical impact of the self-correction process and the techniques provided in the paper. 2. **Lack of Significant Theory and Insight**: The paper
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsSoftmax · Attention Is All You Need
