Self-Correction as Feedback Control: Error Dynamics, Stability Thresholds, and Prompt Interventions in LLMs
Aofan Liu, Jingxiang Meng

TL;DR
This paper models iterative self-correction in large language models as a feedback control system, identifying stability thresholds and demonstrating how prompt interventions can prevent performance degradation.
Contribution
It introduces a Markov model for error dynamics, establishes a measurable stability threshold, and empirically validates how prompt interventions improve model accuracy.
Findings
A sharp EIR threshold (<0.5%) separates beneficial from harmful self-correction.
Prompt interventions can reduce EIR and reverse degradation in GPT-4o-mini.
Adaptive self-consistency halts harmful refinement and reveals a two-tier capability structure.
Abstract
Iterative self-correction is increasingly deployed in agentic LLM systems, yet whether repeated refinement improves or degrades performance remains inconsistent across models. We recast self-correction as a closed-loop feedback-control problem in which the same model is both controller and plant, and analyze its error dynamics via a two-state Markov model over {Correct, Incorrect}, parameterized by the Error Introduction Rate (EIR) and Error Correction Rate (ECR). The model yields a directly measurable stability threshold -- iterate only when ECR/EIR > Acc/(1-Acc) -- in which EIR acts as a stability margin and prompting becomes lightweight controller design. Empirically, across 7 models and 3 datasets (GSM8K, MATH, StrategyQA), a sharp near-zero EIR boundary (< 0.5%) cleanly separates beneficial from harmful self-correction: only o3-mini (+3.4 pp), Claude Opus 4.6 (+0.6 pp), and o4-mini…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
