DPO-Shift: Shifting the Distribution of Direct Preference Optimization

Xiliang Yang; Feng Jiang; Qianen Zhang; Lei Zhao; Xiao Li

arXiv:2502.07599·cs.CL·June 9, 2025

DPO-Shift: Shifting the Distribution of Direct Preference Optimization

Xiliang Yang, Feng Jiang, Qianen Zhang, Lei Zhao, Xiao Li

PDF

Open Access 1 Repo 7 Models

TL;DR

This paper introduces DPO-Shift, a method to control the distribution of chosen response probabilities in preference optimization, addressing likelihood displacement and improving alignment with human preferences.

Contribution

DPO-Shift provides a simple, theoretically grounded approach to mitigate likelihood displacement in preference optimization, with demonstrated improvements on downstream tasks.

Findings

01

DPO-Shift effectively shifts the chosen probability distribution.

02

There is a fundamental trade-off between chosen probability and reward margin.

03

DPO-Shift outperforms standard DPO on downstream benchmarks.

Abstract

Direct Preference Optimization (DPO) and its variants have become increasingly popular for aligning language models with human preferences. These methods aim to teach models to better distinguish between chosen (or preferred) and rejected (or dispreferred) responses. However, prior research has identified that the probability of chosen responses often decreases during training, and this phenomenon is known as likelihood displacement. To tackle this challenge, in this work we introduce DPO-Shift to controllably shift the distribution of the chosen probability. Then, we show that DPO-Shift exhibits a fundamental trade-off between improving the chosen probability and sacrificing the reward margin, as supported by both theoretical analysis and experimental validation. Furthermore, we demonstrate the superiority of DPO-Shift over DPO on downstream tasks such as MT-Bench and a designed win…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

meaquadddd/dpo-shift
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConstraint Satisfaction and Optimization

MethodsDirect Preference Optimization