A Task-Centric Theory for Iterative Self-Improvement with Easy-to-Hard Curricula
Chenruo Liu, Yijun Dong, Yiqiu Shen, Qi Lei

TL;DR
This paper develops a theoretical framework for iterative self-improvement of large language models, analyzing how curricula from easy to hard tasks can enhance learning and providing finite-sample guarantees for reward-based fine-tuning.
Contribution
It introduces a task-centric theory for self-improvement, deriving conditions under which easy-to-hard curricula outperform fixed task mixtures, supported by theoretical analysis and experiments.
Findings
Self-improvement models accept more data as they improve, enabling sustained progress.
Easy-to-hard curricula can provably outperform fixed task mixtures under certain conditions.
Finite-sample guarantees are established for reward-optimized fine-tuning in iterative settings.
Abstract
Iterative self-improvement fine-tunes an autoregressive large language model (LLM) on reward-verified outputs generated by the LLM itself. In contrast to the empirical success of self-improvement, the theoretical foundation of this generative, iterative procedure in a practical, finite-sample setting remains limited. We make progress toward this goal by modeling each round of self-improvement as maximum-likelihood fine-tuning on a reward-filtered distribution and deriving finite-sample guarantees for the expected reward. Our analysis reveals an explicit feedback loop where better models accept more data per iteration, supporting sustained self-improvement while explaining eventual saturation of such improvement. Adopting a task-centric view by considering reasoning tasks with multiple difficulty levels, we further prove quantifiable conditions on model initialization, task difficulty,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Intelligent Tutoring Systems and Adaptive Learning
