ASFT: Aligned Supervised Fine-Tuning through Absolute Likelihood

Ruoyu Wang; Jiachen Sun; Shaowei Hua; Quan Fang

arXiv:2409.10571·cs.LG·September 18, 2024

ASFT: Aligned Supervised Fine-Tuning through Absolute Likelihood

Ruoyu Wang, Jiachen Sun, Shaowei Hua, Quan Fang

PDF

Open Access 1 Repo

TL;DR

This paper introduces ASFT, a novel fine-tuning method that improves alignment of large language models with human preferences by optimizing absolute likelihood, addressing limitations of DPO.

Contribution

ASFT is a new fine-tuning approach that optimizes absolute likelihood for better alignment, eliminating the need for a reference model and mitigating issues in DPO.

Findings

01

ASFT outperforms DPO and variants on instruction-following benchmarks.

02

Theoretical analysis shows ASFT mitigates probability decrease of dispreferred data.

03

Extensive experiments confirm ASFT's effectiveness in model alignment.

Abstract

Direct Preference Optimization (DPO) is a method for enhancing model performance by directly optimizing for the preferences or rankings of outcomes, instead of traditional loss functions. This approach has proven effective in aligning Large Language Models (LLMs) with human preferences. Despite its widespread use across various tasks, DPO has been criticized for its sensitivity to the effectiveness of Supervised Fine-Tuning (SFT) and its limitations in enabling models to learn human-preferred responses, leading to less satisfactory performance. To address these limitations, we propose Aligned Supervised Fine-Tuning (ASFT), an effective approach that better aligns LLMs with pair-wise datasets by optimizing absolute likelihood for each response, rather than using the Bradley-Terry model, and eliminates the need for a reference model. Through theoretical gradient analysis, we demonstrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

turbo-llm/turbo-alignment
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Signal Denoising Methods

MethodsDirect Preference Optimization