DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation

Speed Zhu; Jianwei Cai; Guang Chen; Lulu Wu; Saiyong Yang; Wiggin Zhou

arXiv:2511.06307·cs.LG·November 11, 2025

DRIVE: Data Curation Best Practices for Reinforcement Learning with Verifiable Reward in Competitive Code Generation

Speed Zhu, Jianwei Cai, Guang Chen, Lulu Wu, Saiyong Yang, Wiggin Zhou

PDF

Open Access 2 Models

TL;DR

This paper presents a comprehensive data curation and training pipeline for reinforcement learning with verifiable rewards in competitive programming code generation, achieving state-of-the-art results.

Contribution

It introduces practical data curation strategies, a two-stage RL training process with curriculum design, and demonstrates strong performance on competitive programming benchmarks.

Findings

01

State-of-the-art performance on LeetCode and Codeforces

02

Effective use of curriculum and data curation in RLVR

03

Strong scaling observed on large-scale models

Abstract

Recent reasoning-first models (e.g., OpenAI o1, DeepSeek R1) have spurred a resurgence of interest in RLVR. Nevertheless, advances are dominated by mathematics (e.g., AIME), with competitive-programming code generation underexplored and data curation receiving less attention than RL algorithm design. We investigate how to construct RLVR datasets (i.e., RL prompts) and present practical training techniques that yield strong performance on competitive-programming code generation. Our pipeline begins with supervised fine-tuning (SFT) distilled from strong open-source models, augmented with general-purpose and reasoning-intensive data. RL then follows a two-stage process with executable, testcase-driven rewards: first, training on a large, uniformly distributed set of competitive-programming problems using Group Relative Policy Optimization (GRPO) with 8 rollouts per prompt and a relatively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Mobile Crowdsensing and Crowdsourcing · Machine Learning and Data Classification