Large Language Models Cannot Self-Correct Reasoning Yet

Jie Huang; Xinyun Chen; Swaroop Mishra; Huaixiu Steven Zheng; Adams; Wei Yu; Xinying Song; Denny Zhou

arXiv:2310.01798·cs.CL·March 15, 2024·33 cites

Large Language Models Cannot Self-Correct Reasoning Yet

Jie Huang, Xinyun Chen, Swaroop Mishra, Huaixiu Steven Zheng, Adams, Wei Yu, Xinying Song, Denny Zhou

PDF

Open Access 3 Reviews

TL;DR

This paper critically evaluates the current capabilities of large language models in self-correcting their reasoning without external feedback, revealing significant limitations and performance issues.

Contribution

It provides a systematic analysis of intrinsic self-correction in LLMs, highlighting their struggles and limitations in autonomous reasoning correction.

Findings

01

LLMs struggle to self-correct without external feedback

02

Self-correction can sometimes degrade performance

03

Intrinsic self-correction is currently ineffective for reasoning tasks

Abstract

Large Language Models (LLMs) have emerged as a groundbreaking technology with their unparalleled text generation capabilities across various applications. Nevertheless, concerns persist regarding the accuracy and appropriateness of their generated content. A contemporary methodology, self-correction, has been proposed as a remedy to these issues. Building upon this premise, this paper critically examines the role and efficacy of self-correction within LLMs, shedding light on its true potential and limitations. Central to our investigation is the notion of intrinsic self-correction, whereby an LLM attempts to correct its initial responses based solely on its inherent capabilities, without the crutch of external feedback. In the context of reasoning, our research indicates that LLMs struggle to self-correct their responses without external feedback, and at times, their performance even…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

The paper tackles a very important topic, has a good literature review section, and uses well-known and trusted datasets to investigate self-correction abilities. In particular, I appreciated the distinction between intrinsic self-correction and self-correction that leverages information from humans or training examples.

Weaknesses

- Only a small set of questions (200) is used on GPT4, the remaining ones apply only to ChatGPT - In terms of reasoning, there are much more challenging datasets out there - I found the presentation somewhat confusing since there wasn't a clear description of their methodology (e.g., were all self-correction prompts formulated as the examples in Figure 2, or did variations exist?). - Also, the wording is unfortunately often trendy rather than clear: From the conclusion: "while LLMs represent a

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

Self-correction of today’s LLMs is a highly significant topic. The paper clearly points out the crucial distinction between self-correction with and without feedback, and sheds light on the latter case (intrinsic self-correction). The usage of oracle labels to terminate self-critique is also examined. The paper’s organization and writing quality is uniformly high, making it a pleasure to read.

Weaknesses

The most serious weakness of the paper is its misleading title, which baldly asserts a claim unsupported by the analysis and results. The words “cannot” and “yet” imply that even today’s most capable LLMs (GPT-4) obtain zero benefit from self-correction in nearly all cases. The abstract quickly tones down the claim by saying “our research indicates that LLMs struggle to self-correct their responses without external feedback”, but even that statement goes beyond what is actually demonstrated by t

Reviewer 03Rating 8· accept, good paperConfidence 4

Strengths

* The paper studies an important direction that is now taking over the LLM scene and brings a fresh perspective on how good SoTA LLMs are at detecting their own errors. * Focusing on *intrinsic self-correction* is much needed in the current "sea" of self-correction papers. * The experimental design is sound. I liked the random guessing baseline with Commonsense QA. * I have to say I enjoyed reading the paper: the flow is natural, the writing is good, and most of the arguments are intuitive an

Weaknesses

* I find the explanation in section 3.2.1—why post-hoc prompting can lead the model to go from a correct to an incorrect answer—unsatisfying. We know that the feedback prompt is changing the model output somehow. The question is *why?* I suggest providing more intuition here. * The paper discusses the issue but does not provide any hint at a potential solution. I understand this is not the point of the paper but hinting at potential directions to improve intrinsic self-correctness could make the

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling