A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula
Cansu Sancaktar, David Zhang, Gabriel Synnaeve, Taco Cohen

TL;DR
This paper introduces a scalable synthetic data generation pipeline with curricula for reinforcement learning in code generation, improving performance and training dynamics of large language models.
Contribution
It presents a multi-turn synthetic data generation method that enhances data diversity and structure without fine-tuning, supporting curriculum-based RL training.
Findings
Synthetic data augmentation improves in-domain code performance.
Curriculum design and data diversity jointly influence RL training outcomes.
Multi-turn generation yields more valid and structured problems than single-turn methods.
Abstract
Reinforcement learning (RL) has emerged as a powerful paradigm for improving large language models beyond supervised fine-tuning, yet sustaining performance gains at scale remains an open challenge, as data diversity and structure, rather than volume alone, become the limiting factor. We address this by introducing a scalable multi-turn synthetic data generation pipeline in which a teacher model iteratively refines problems based on in-context student performance summaries, producing structured difficulty progressions without any teacher fine-tuning. Compared to single-turn generation, this multi-turn approach substantially improves the yield of valid synthetic problems and naturally produces stepping stones, i.e. easier and harder variants of the same core task, that support curriculum-based training. We systematically study how task difficulty, curriculum scheduling, and environment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
