DeReason: A Difficulty-Aware Curriculum Improves Decoupled SFT-then-RL Training for General Reasoning

Hanxu Hu; Yuxuan Wang; Maggie Huan; Jannis Vamvas; Yinya Huang; Zhijiang Guo; Rico Sennrich

arXiv:2603.11193·cs.CL·March 13, 2026

DeReason: A Difficulty-Aware Curriculum Improves Decoupled SFT-then-RL Training for General Reasoning

Hanxu Hu, Yuxuan Wang, Maggie Huan, Jannis Vamvas, Yinya Huang, Zhijiang Guo, Rico Sennrich

PDF

Open Access

TL;DR

This paper introduces DeReason, a difficulty-aware curriculum that improves the training of large language models for reasoning tasks by strategically combining supervised fine-tuning and reinforcement learning based on problem difficulty.

Contribution

DeReason proposes a novel data decoupling strategy that partitions training data by reasoning difficulty, enhancing the effectiveness of sequential SFT and RL for general reasoning tasks.

Findings

01

Decoupling training data by difficulty improves reasoning performance.

02

DeReason outperforms SFT-only, RL-only, and random-split baselines.

03

The curriculum enhances training efficiency and reasoning capabilities.

Abstract

Reinforcement learning with Verifiable Rewards (RLVR) has emerged as a powerful paradigm for eliciting reasoning capabilities in large language models, particularly in mathematics and coding. While recent efforts have extended this paradigm to broader general scientific (STEM) domains, the complex interplay between supervised fine-tuning (SFT) and RL in these contexts remains underexplored. In this paper, we conduct controlled experiments revealing a critical challenge: for general STEM domains, RL applied directly to base models is highly sample-inefficient and is consistently surpassed by supervised fine-tuning (SFT) on moderate-quality responses. Yet sequential SFT followed by RL can further improve performance, suggesting that the two stages play complementary roles, and that how training data is allocated between them matters. Therefore, we propose DeReason, a difficulty-based data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques