Self-Play Only Evolves When Self-Synthetic Pipeline Ensures Learnable Information Gain
Wei Liu, Siya Qi, Yali Du, Yulan He

TL;DR
This paper demonstrates that sustainable self-evolution of large language models requires a self-synthetic data pipeline that ensures increasing learnable information across iterations, involving roles like proposer, solver, and verifier.
Contribution
It introduces a triadic roles framework and system designs that promote learnable information gain, enabling sustained self-evolution of LLMs.
Findings
Asymmetric co-evolution enhances learnable information across roles.
Capacity growth aligns model resources with increasing information.
Proactive information seeking prevents saturation and supports continuous improvement.
Abstract
Large language models (LLMs) make it plausible to build systems that improve through self-evolving loops, but many existing proposals are better understood as self-play and often plateau quickly. A central failure mode is that the loop synthesises more data without increasing learnable information for the next iteration. Through experiments on a self-play coding task, we reveal that sustainable self-evolution requires a self-synthesised data pipeline with learnable information that increases across iterations. We identify triadic roles that self-evolving LLMs play: the Proposer, which generates tasks; the Solver, which attempts solutions; and the Verifier, which provides training signals, and we identify three system designs that jointly target learnable information gain from this triadic roles perspective. Asymmetric co-evolution closes a weak-to-strong-to-weak loop across roles.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Machine Learning in Materials Science · Topic Modeling
