Synthetic Data RL: Task Definition Is All You Need

Yiduo Guo; Zhen Guo; Chuanwei Huang; Zi-Ang Wang; Zekai Zhang; Haofei Yu; Huishuai Zhang; Yikang Shen

arXiv:2505.17063·cs.CL·May 26, 2025

Synthetic Data RL: Task Definition Is All You Need

Yiduo Guo, Zhen Guo, Chuanwei Huang, Zi-Ang Wang, Zekai Zhang, Haofei Yu, Huishuai Zhang, Yikang Shen

PDF

1 Repo

TL;DR

Synthetic Data RL is a novel framework that uses only synthetic data generated from task definitions to fine-tune models with reinforcement learning, reducing reliance on human-labeled data and improving performance across multiple benchmarks.

Contribution

The paper introduces Synthetic Data RL, a simple, general method for reinforcement fine-tuning models solely with synthetic data derived from task definitions, demonstrating significant performance gains.

Findings

01

Achieves 29.2% improvement on GSM8K over base model.

02

Surpasses supervised fine-tuning with the same data budget.

03

Limited benefit from adding human demonstrations.

Abstract

Reinforcement learning (RL) is a powerful way to adapt foundation models to specialized tasks, but its reliance on large-scale human-labeled data limits broad adoption. We introduce Synthetic Data RL, a simple and general framework that reinforcement fine-tunes models using only synthetic data generated from a task definition. Our method first generates question and answer pairs from the task definition and retrieved documents, then adapts the difficulty of the question based on model solvability, and selects questions using the average pass rate of the model across samples for RL training. On Qwen-2.5-7B, our method achieves a 29.2% absolute improvement over the base model on GSM8K (+2.9 pp vs. instruction-tuned, +6.6 pp vs. Self-Instruct), 8.7% on MATH, 13.1% on GPQA (+7.0 pp vs. SynthLLM), 8.9% on MedQA, 17.7% on CQA (law) and 13.7% on CFA (finance). It surpasses supervised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gydpku/data_synthesis_rl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsBalanced Selection