Step-wise Adaptive Integration of Supervised Fine-tuning and Reinforcement Learning for Task-Specific LLMs

Jack Chen; Fazhong Liu; Naruto Liu; Yuhan Luo; Erqu Qin; Harry Zheng; Tian Dong; Haojin Zhu; Yan Meng; Xiao Wang

arXiv:2505.13026·cs.LG·August 5, 2025

Step-wise Adaptive Integration of Supervised Fine-tuning and Reinforcement Learning for Task-Specific LLMs

Jack Chen, Fazhong Liu, Naruto Liu, Yuhan Luo, Erqu Qin, Harry Zheng, Tian Dong, Haojin Zhu, Yan Meng, Xiao Wang

PDF

Open Access

TL;DR

This paper introduces SASR, a step-wise adaptive hybrid training framework that unifies supervised fine-tuning and reinforcement learning to improve task-specific large language models, overcoming limitations of static methods.

Contribution

SASR dynamically balances SFT and RL during training using gradient-based adjustments, inspired by curriculum learning, to enhance reasoning abilities of LLMs.

Findings

01

SASR outperforms SFT, RL, and static hybrid methods in experiments.

02

The adaptive framework maintains core reasoning skills while exploring new paths.

03

Dynamic adjustment improves training stability and generalization.

Abstract

Large language models (LLMs) excel at mathematical reasoning and logical problem-solving. The current popular training paradigms primarily use supervised fine-tuning (SFT) and reinforcement learning (RL) to enhance the models' reasoning abilities. However, when using SFT or RL alone, there are respective challenges: SFT may suffer from overfitting, while RL is prone to mode collapse. The state-of-the-art methods have proposed hybrid training schemes. However, static switching faces challenges such as poor generalization across different tasks and high dependence on data quality. In response to these challenges, inspired by the curriculum learning-quiz mechanism in human reasoning cultivation, We propose SASR, a step-wise adaptive hybrid training framework that theoretically unifies SFT and RL and dynamically balances the two throughout optimization. SASR uses SFT for initial warm-up to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning and Data Classification · Multimodal Machine Learning Applications

MethodsShrink and Fine-Tune