Towards Revealing the Effectiveness of Small-Scale Fine-tuning in R1-style Reinforcement Learning
Yutong Chen, Jiandong Gao, Ji Wu

TL;DR
This paper investigates the effectiveness of small-scale fine-tuning in R1-style reinforcement learning for large language models, proposing a new re-distillation method that improves efficiency and performance with fewer samples.
Contribution
It introduces an analytical framework to understand small-scale fine-tuning in RL, and proposes re-distillation to enhance its efficiency and effectiveness.
Findings
Re-distillation matches RL performance with fewer samples
Re-distilled models outperform larger models on key datasets
Re-distillation efficiently balances multiple RL goals
Abstract
R1-style Reinforcement Learning (RL) significantly enhances Large Language Models' reasoning capabilities, yet the mechanism behind rule-based RL remains unclear. We found that small-scale SFT has substantial influence on RL but shows poor efficiency. To explain our observations, we propose an analytical framework and compare the efficiency of SFT and RL by measuring \textbf{sample effect}. Our hypothetical analysis shows the potential to improve SFT efficiency. Guided by our analysis, we propose \textbf{Re-distillation}, a technique that aims to boost the effectiveness of small-scale distillation by sampling from the RL-trained policy. Re-distillation shows consistent surprising efficiency on three datasets and both Qwen\&Llama models: Re-distilled models matched RL performance with far fewer samples and less computation. As a result, on K\&K dataset, our re-distilled Qwen-2.5-1.5B…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsShrink and Fine-Tune
