A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula

Cansu Sancaktar; David Zhang; Gabriel Synnaeve; Taco Cohen

arXiv:2603.24202·cs.LG·March 26, 2026

A Deep Dive into Scaling RL for Code Generation with Synthetic Data and Curricula

Cansu Sancaktar, David Zhang, Gabriel Synnaeve, Taco Cohen

PDF

Open Access

TL;DR

This paper introduces a scalable synthetic data generation pipeline with curricula for reinforcement learning in code generation, improving performance and training dynamics of large language models.

Contribution

It presents a multi-turn synthetic data generation method that enhances data diversity and structure without fine-tuning, supporting curriculum-based RL training.

Findings

01

Synthetic data augmentation improves in-domain code performance.

02

Curriculum design and data diversity jointly influence RL training outcomes.

03

Multi-turn generation yields more valid and structured problems than single-turn methods.

Abstract

Reinforcement learning (RL) has emerged as a powerful paradigm for improving large language models beyond supervised fine-tuning, yet sustaining performance gains at scale remains an open challenge, as data diversity and structure, rather than volume alone, become the limiting factor. We address this by introducing a scalable multi-turn synthetic data generation pipeline in which a teacher model iteratively refines problems based on in-context student performance summaries, producing structured difficulty progressions without any teacher fine-tuning. Compared to single-turn generation, this multi-turn approach substantially improves the yield of valid synthetic problems and naturally produces stepping stones, i.e. easier and harder variants of the same core task, that support curriculum-based training. We systematically study how task difficulty, curriculum scheduling, and environment…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications