Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive

Arka Pal; Deep Karkhanis; Samuel Dooley; Manley Roberts; Siddartha; Naidu; Colin White

arXiv:2402.13228·cs.CL·July 4, 2024·6 cites

Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive

Arka Pal, Deep Karkhanis, Samuel Dooley, Manley Roberts, Siddartha, Naidu, Colin White

PDF

Open Access 2 Repos 10 Models 3 Datasets

TL;DR

This paper introduces DPO-Positive, a new loss function that addresses failure modes in preference-based fine-tuning of large language models, leading to improved performance across various tasks and datasets.

Contribution

The authors identify a failure mode in standard DPO and propose DPO-Positive, a novel training method that outperforms DPO and other fine-tuning techniques on multiple benchmarks.

Findings

01

DPO can reduce the likelihood of preferred examples during training.

02

DPO-Positive mitigates this issue and improves downstream task performance.

03

Smaug-72B surpasses 80% accuracy on HuggingFace leaderboard.

Abstract

Direct Preference Optimisation (DPO) is effective at significantly improving the performance of large language models (LLMs) on downstream tasks such as reasoning, summarisation, and alignment. Using pairs of preferred and dispreferred data, DPO models the relative probability of picking one response over another. In this work, first we show theoretically that the standard DPO loss can lead to a reduction of the model's likelihood of the preferred examples, as long as the relative probability between the preferred and dispreferred classes increases. We then show empirically that this phenomenon occurs when fine-tuning LLMs on common datasets, especially datasets in which the edit distance between pairs of completions is low. Using these insights, we design DPO-Positive (DPOP), a new loss function and training procedure which avoids this failure mode. Surprisingly, we find that DPOP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDecision-Making and Behavioral Economics

MethodsDirect Preference Optimization