Preventing Curriculum Collapse in Self-Evolving Reasoning Systems
Vaibhav Mishra

TL;DR
This paper introduces Prism, a novel method to prevent diversity collapse in self-evolving reasoning systems, significantly improving their ability to generate diverse, challenging mathematical problems and enhancing reasoning accuracy.
Contribution
Prism is a question-centric approach that maintains semantic diversity in self-evolving systems by using a persistent coverage signal and a ZPD gate, addressing a key limitation in prior work.
Findings
Prism outperforms five baselines on seven mathematical benchmarks.
Prism generates a diverse set of 100k mathematical questions.
Prism achieves up to +3.98 accuracy points over R-Zero on AMC.
Abstract
Self-evolving reasoning frameworks let LLMs improve their reasoning capabilities by iteratively generating and solving problems without external supervision, using verifiable rewards. Ideally, such systems are expected to explore a diverse problem space and propose new challenges of high learning value. While prior work has largely focused on solver-side optimisation and verification, recent evidence suggests that self-evolving systems can exhibit diversity collapse in posing new problems after just a few iterations, even when surface-level variation is preserved. We introduce Prism, a question-centric self-evolution method that directly tackles this collapse. Prism defines a persistent diversity signal over an embedding-induced semantic partition of mathematical problems and uses it to encourage balanced exploration of underrepresented regions across iterations. This coverage signal is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Machine Learning in Materials Science · Teaching and Learning Programming
