TL;DR
This paper introduces a probabilistic theory modeling how LLMs improve their accuracy through self-correction over multiple rounds, providing a mathematical framework that predicts accuracy evolution.
Contribution
It presents a novel probabilistic model explaining the dynamics of accuracy improvement in LLM self-correction, validated by experiments across various models and datasets.
Findings
Accuracy follows an exponential convergence pattern.
The model accurately predicts accuracy after a single self-correction round.
Theoretical predictions closely match empirical results.
Abstract
Large Language Models (LLMs) have demonstrated the capability to refine their generated answers through self-correction, enabling continuous performance improvement over multiple rounds. However, the mechanisms underlying how and why accuracy evolves during this iterative process remain unexplored. To fill this gap, we propose a probabilistic theory to model the dynamics of accuracy change and explain the performance improvements observed in multi-round self-correction. Through mathematical derivation, we establish that the accuracy after the round of self-correction is given by: where denotes the initial accuracy, represents the upper bound of accuracy convergence, and determines the rate of convergence. Based on our theory, these parameters can be calculated and the predicted accuracy curve then can be obtained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
