UFT: Unifying Supervised and Reinforcement Fine-Tuning

Mingyang Liu; Gabriele Farina; Asuman Ozdaglar

arXiv:2505.16984·cs.LG·October 21, 2025

UFT: Unifying Supervised and Reinforcement Fine-Tuning

Mingyang Liu, Gabriele Farina, Asuman Ozdaglar

PDF

Open Access 1 Repo 10 Models

TL;DR

UFT introduces a unified post-training method combining supervised and reinforcement fine-tuning, significantly improving reasoning abilities and convergence speed of large language models across different sizes.

Contribution

The paper proposes UFT, a novel unified fine-tuning approach that integrates SFT and RFT, overcoming their individual limitations and accelerating reasoning convergence.

Findings

01

UFT outperforms SFT and RFT across various model sizes.

02

UFT breaks the exponential sample complexity bottleneck of RFT.

03

Theoretically proves exponential acceleration in convergence for long-horizon reasoning.

Abstract

Post-training has demonstrated its importance in enhancing the reasoning capabilities of large language models (LLMs). The primary post-training methods can be categorized into supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT). SFT is efficient and well-suited for small language models, but it may lead to overfitting and limit the reasoning abilities of larger models. In contrast, RFT generally yields better generalization but depends heavily on the strength of the base model. To address the limitations of SFT and RFT, we propose Unified Fine-Tuning (UFT), a novel post-training paradigm that unifies SFT and RFT into a single, integrated process. UFT enables the model to effectively explore solutions while incorporating informative supervision signals, bridging the gap between memorizing and thinking underlying existing methods. Notably, UFT outperforms both SFT and RFT in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liumy2010/uft
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsShrink and Fine-Tune · Balanced Selection