Decomposing LLM Self-Correction: The Accuracy-Correction Paradox and Error Depth Hypothesis
Yin Li

TL;DR
This paper analyzes the self-correction abilities of large language models, revealing a paradox where weaker models correct errors more effectively than stronger ones, and proposing the Error Depth Hypothesis to explain this phenomenon.
Contribution
It systematically decomposes LLM self-correction into detection, localization, and correction, and introduces the Error Depth Hypothesis to explain the inverse relationship between model strength and correction rate.
Findings
Weaker models achieve higher intrinsic correction rates than stronger models.
Error detection capability varies widely across architectures and does not predict correction success.
Providing error location hints can negatively impact model correction performance.
Abstract
Large Language Models (LLMs) are widely believed to possess self-correction capabilities, yet recent studies suggest that intrinsic self-correction--where models correct their own outputs without external feedback--remains largely ineffective. In this work, we systematically decompose self-correction into three distinct sub-capabilities: error detection, error localization, and error correction. Through cross-model experiments on GSM8K-Complex (n=500 per model, 346 total errors) with three major LLMs, we uncover a striking Accuracy-Correction Paradox: weaker models (GPT-3.5, 66% accuracy) achieve 1.6x higher intrinsic correction rates than stronger models (DeepSeek, 94% accuracy)--26.8% vs 16.7%. We propose the Error Depth Hypothesis: stronger models make fewer but deeper errors that resist self-correction. Error detection rates vary dramatically across architectures (10% to 82%), yet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education
