Discriminative Finetuning of Generative Large Language Models without Reward Models and Human Preference Data
Siqi Guo, Ilgee Hong, Vicente Balmaseda, Changlong Yu, Liang Qiu, Xin Liu, Haoming Jiang, Tuo Zhao, Tianbao Yang

TL;DR
This paper introduces Discriminative Fine-Tuning (DFT), a novel method for aligning large language models that improves over supervised fine-tuning by explicitly modeling answer likelihoods without requiring human preference data or reward models.
Contribution
The paper proposes a discriminative learning framework for LLM fine-tuning, providing algorithms and demonstrating superior performance over traditional supervised fine-tuning.
Findings
DFT outperforms standard SFT in experiments.
DFT achieves comparable or better results than SFT followed by preference optimization.
The approach reduces reliance on human-labeled preference data or reward models.
Abstract
Supervised fine-tuning (SFT) has become a crucial step for aligning pretrained large language models (LLMs) using supervised datasets of input-output pairs. However, despite being supervised, SFT is inherently limited by its generative training objective. To address its limitations, the existing common strategy is to follow SFT with a separate phase of preference optimization (PO), which relies on either human-labeled preference data or a strong reward model to guide the learning process. In this paper, we address the limitations of SFT by exploring one of the most successful techniques in conventional supervised learning: discriminative learning. We introduce Discriminative Fine-Tuning (DFT), an improved variant of SFT, which mitigates the burden of collecting human-labeled preference data or training strong reward models. Unlike SFT that employs a generative approach and overlooks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiscriminative Fine-Tuning · Shrink and Fine-Tune
