ASFT: Aligned Supervised Fine-Tuning through Absolute Likelihood
Ruoyu Wang, Jiachen Sun, Shaowei Hua, Quan Fang

TL;DR
This paper introduces ASFT, a novel fine-tuning method that improves alignment of large language models with human preferences by optimizing absolute likelihood, addressing limitations of DPO.
Contribution
ASFT is a new fine-tuning approach that optimizes absolute likelihood for better alignment, eliminating the need for a reference model and mitigating issues in DPO.
Findings
ASFT outperforms DPO and variants on instruction-following benchmarks.
Theoretical analysis shows ASFT mitigates probability decrease of dispreferred data.
Extensive experiments confirm ASFT's effectiveness in model alignment.
Abstract
Direct Preference Optimization (DPO) is a method for enhancing model performance by directly optimizing for the preferences or rankings of outcomes, instead of traditional loss functions. This approach has proven effective in aligning Large Language Models (LLMs) with human preferences. Despite its widespread use across various tasks, DPO has been criticized for its sensitivity to the effectiveness of Supervised Fine-Tuning (SFT) and its limitations in enabling models to learn human-preferred responses, leading to less satisfactory performance. To address these limitations, we propose Aligned Supervised Fine-Tuning (ASFT), an effective approach that better aligns LLMs with pair-wise datasets by optimizing absolute likelihood for each response, rather than using the Bradley-Terry model, and eliminates the need for a reference model. Through theoretical gradient analysis, we demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage and Signal Denoising Methods
MethodsDirect Preference Optimization
