Vocabulary Dropout for Curriculum Diversity in LLM Co-Evolution
Jacob Dineen, Aswin RRV, Zhikun Xu, Ben Zhou

TL;DR
This paper introduces vocabulary dropout, a simple technique to maintain diversity in co-evolutionary language models, leading to improved reasoning performance and more varied problem generation.
Contribution
It proposes vocabulary dropout as a lightweight method to prevent diversity collapse in language model co-evolution, enhancing curriculum quality and solver accuracy.
Findings
Vocabulary dropout sustains proposer's lexical, semantic, and functional diversity.
It improves solver performance by an average of +4.4 points on benchmarks.
The method prevents proposers from converging to narrow token sequences.
Abstract
Co-evolutionary self-play, where one language model generates problems and another solves them, promises autonomous curriculum learning without human supervision. In practice, the proposer quickly converges to a narrow distribution of problems that satisfy the reward function. This diversity collapse renders the curriculum uninformative for the solver, stalling the co-evolutionary loop. We introduce vocabulary dropout, a random mask applied to the proposer's output logits during both policy training and curriculum generation, as a lightweight mechanism to sustain diversity. The mask is hard and non-stationary, preventing the proposer from locking into fixed token sequences. Training Qwen3-4B and Qwen3-8B on mathematical reasoning via R-Zero, we find that vocabulary dropout sustains proposer diversity across lexical, semantic, and functional metrics throughout training, and yields solver…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
