Supervised Fine-Tuning versus Reinforcement Learning: A Study of Post-Training Methods for Large Language Models
Haitao Jiang, Wenbo Zhang, Jiarui Yao, Hengrui Cai, Sheng Wang, Rui Song

TL;DR
This paper provides a comprehensive analysis of post-training methods for large language models, focusing on supervised fine-tuning and reinforcement learning, their integration, and emerging hybrid paradigms from recent studies.
Contribution
It offers a unified framework connecting SFT and RL, analyzing their interplay, and highlighting trends toward hybrid post-training approaches in LLMs.
Findings
Hybrid training pipelines are increasingly prevalent.
SFT and RL are often combined for improved performance.
Recent studies favor hybrid paradigms for scalability and effectiveness.
Abstract
Pre-trained Large Language Model (LLM) exhibits broad capabilities, yet, for specific tasks or domains their attainment of higher accuracy and more reliable reasoning generally depends on post-training through Supervised Fine-Tuning (SFT) or Reinforcement Learning (RL). Although often treated as distinct methodologies, recent theoretical and empirical developments demonstrate that SFT and RL are closely connected. This study presents a comprehensive and unified perspective on LLM post-training with SFT and RL. We first provide an in-depth overview of both techniques, examining their objectives, algorithmic structures, and data requirements. We then systematically analyze their interplay, highlighting frameworks that integrate SFT and RL, hybrid training pipelines, and methods that leverage their complementary strengths. Drawing on a representative set of recent application studies from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications
