$\xi$-DPO: Direct Preference Optimization via Ratio Reward Margin

Zhengyuan Fan; Zhonghua Wu; Yuxuan Du; Qun Chen

arXiv:2605.10981·cs.LG·May 13, 2026

$\xi$-DPO: Direct Preference Optimization via Ratio Reward Margin

Zhengyuan Fan, Zhonghua Wu, Yuxuan Du, Qun Chen

PDF

TL;DR

The paper introduces $\xi$-DPO, a novel preference optimization method that reformulates the objective to improve interpretability and eliminate hyperparameter tuning challenges in reference-free preference learning.

Contribution

It proposes a new ratio reward margin formulation that simplifies preference optimization and removes the need for tuning hyperparameters like $eta$ and $\gamma$.

Findings

01

$\xi$-DPO effectively cancels the effect of $eta$ in preference optimization.

02

The ratio reward margin $\xi$ is interpretable and can be set based on initial reward gap distribution.

03

Experimental results demonstrate improved stability and performance over existing methods.

Abstract

Reference-free preference optimization has emerged as an efficient alternative to reinforcement learning from human feedback, with Simple Preference Optimization(SimPO) demonstrating strong performance by eliminating the explicit reference model through a simple objective. However, the joint tuning of the hyperparameters $β$ and $γ$ in SimPO remains a central challenge. We argue that this difficulty arises because the margin formulation in SimPO is not easily interpretable across datasets with different reward gap structures. To better understand this issue, we conduct a comprehensive analysis of SimPO and find that $β$ implicitly controls sample filtering, while the effect of $γ$ depends on the reward gap structure of the dataset. Motivated by these observations, we propose $ξ$ -DPO: Direct preference optimization via ratio reward margin. We first reformulate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.