Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning

Yutong Chen; Jiandong Gao; Ji Wu

arXiv:2505.17988·cs.LG·August 6, 2025

Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning

Yutong Chen, Jiandong Gao, Ji Wu

PDF

1 Repo 1 Datasets

TL;DR

This paper investigates the effectiveness of small-scale fine-tuning in R1-style reinforcement learning for large language models, proposing a new re-distillation method that improves efficiency and performance with fewer samples.

Contribution

It introduces an analytical framework to understand small-scale fine-tuning in RL, and proposes re-distillation to enhance its efficiency and effectiveness.

Findings

01

Re-distillation matches RL performance with fewer samples

02

Re-distilled models outperform larger models on key datasets

03

Re-distillation efficiently balances multiple RL goals

Abstract

R1-style Reinforcement Learning (RL) significantly enhances Large Language Models' reasoning capabilities, yet the mechanism behind rule-based RL remains unclear. We found that small-scale SFT has substantial influence on RL but shows poor efficiency. To explain our observations, we propose an analytical framework and compare the efficiency of SFT and RL by measuring \textbf{sample effect}. Our hypothetical analysis shows the potential to improve SFT efficiency. Guided by our analysis, we propose \textbf{Re-distillation}, a technique that aims to boost the effectiveness of small-scale distillation by sampling from the RL-trained policy. Re-distillation shows consistent surprising efficiency on three datasets and both Qwen\&Llama models: Re-distilled models matched RL performance with far fewer samples and less computation. As a result, on K\&K dataset, our re-distilled Qwen-2.5-1.5B…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

on1262/deep-reasoning
pytorchOfficial

Datasets

Chen-YT/deep-reasoning-kk
dataset· 7 dl
7 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsShrink and Fine-Tune