UFT: Unifying Supervised and Reinforcement Fine-Tuning
Mingyang Liu, Gabriele Farina, Asuman Ozdaglar

TL;DR
UFT introduces a unified post-training method combining supervised and reinforcement fine-tuning, significantly improving reasoning abilities and convergence speed of large language models across different sizes.
Contribution
The paper proposes UFT, a novel unified fine-tuning approach that integrates SFT and RFT, overcoming their individual limitations and accelerating reasoning convergence.
Findings
UFT outperforms SFT and RFT across various model sizes.
UFT breaks the exponential sample complexity bottleneck of RFT.
Theoretically proves exponential acceleration in convergence for long-horizon reasoning.
Abstract
Post-training has demonstrated its importance in enhancing the reasoning capabilities of large language models (LLMs). The primary post-training methods can be categorized into supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT). SFT is efficient and well-suited for small language models, but it may lead to overfitting and limit the reasoning abilities of larger models. In contrast, RFT generally yields better generalization but depends heavily on the strength of the base model. To address the limitations of SFT and RFT, we propose Unified Fine-Tuning (UFT), a novel post-training paradigm that unifies SFT and RFT into a single, integrated process. UFT enables the model to effectively explore solutions while incorporating informative supervision signals, bridging the gap between memorizing and thinking underlying existing methods. Notably, UFT outperforms both SFT and RFT in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗liumy2010/Qwen2.5-0.5B-countdown-SFTmodel· 2 dl2 dl
- 🤗liumy2010/Qwen2.5-1.5B-countdown-SFTmodel· 1 dl1 dl
- 🤗liumy2010/Qwen2.5-3B-countdown-SFTmodel· 2 dl2 dl
- 🤗liumy2010/Qwen2.5-0.5B-math-SFTmodel· 1 dl1 dl
- 🤗liumy2010/Qwen2.5-1.5B-math-SFTmodel· 5 dl5 dl
- 🤗liumy2010/Qwen2.5-3B-math-SFTmodel· 1 dl1 dl
- 🤗liumy2010/Qwen2.5-0.5B-kk_logic-SFTmodel· 1 dl1 dl
- 🤗liumy2010/Qwen2.5-1.5B-kk_logic-SFTmodel· 1 dl1 dl
- 🤗liumy2010/Qwen2.5-3B-kk_logic-SFTmodel· 1 dl1 dl
- 🤗liumy2010/Qwen2.5-0.5B-countdown-RFTmodel· 1 dl1 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsShrink and Fine-Tune · Balanced Selection
