Guided Self-Evolving LLMs with Minimal Human Supervision
Wenhao Yu, Zhenwen Liang, Chengsong Huang, Kishan Panaganti, Tianqing Fang, Haitao Mi, Dong Yu

TL;DR
This paper introduces R-Few, a guided self-evolving framework for large language models that combines minimal human supervision with self-play, leading to stable, iterative improvements in reasoning tasks.
Contribution
The paper presents R-Few, a novel framework that enables stable, guided self-evolution of LLMs with minimal human input, addressing issues like concept drift and bias reinforcement.
Findings
R-Few improves math and reasoning benchmarks significantly.
It achieves performance comparable to models trained on much more human data.
Ablation studies show the effectiveness of grounded training and curriculum learning.
Abstract
AI self-evolution has long been envisioned as a path toward superintelligence, where models autonomously acquire, refine, and internalize knowledge from their own learning experiences. Yet in practice, unguided self-evolving systems often plateau quickly or even degrade as training progresses. These failures arise from issues such as concept drift, diversity collapse, and mis-evolution, as models reinforce their own biases and converge toward low-entropy behaviors. To enable models to self-evolve in a stable and controllable manner while minimizing reliance on human supervision, we introduce R-Few, a guided Self-Play Challenger-Solver framework that incorporates lightweight human oversight through in-context grounding and mixed training. At each iteration, the Challenger samples a small set of human-labeled examples to guide synthetic question generation, while the Solver jointly trains…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Machine Learning and Data Classification · Domain Adaptation and Few-Shot Learning
