Listwise Direct Preference Optimization with Multi-Dimensional Preference Mixing

Yuhui Sun; Xiyao Wang; Zixi Li; YiTian Ding; Tianyang Ling; Jialuo Chen; Tianyi Yu; Zhenlong Yuan; Jinman Zhao

arXiv:2506.19780·cs.LG·January 22, 2026

Listwise Direct Preference Optimization with Multi-Dimensional Preference Mixing

Yuhui Sun, Xiyao Wang, Zixi Li, YiTian Ding, Tianyang Ling, Jialuo Chen, Tianyi Yu, Zhenlong Yuan, Jinman Zhao

PDF

Open Access 1 Repo

TL;DR

This paper introduces $$-DPO, a unified framework for preference optimization that models multi-dimensional human preferences using a mixture of listwise distributions, improving flexibility and robustness in preference learning.

Contribution

The paper proposes $$-DPO, a novel method that captures multi-dimensional preferences with a mixture model and an adaptive scheduler, enhancing preference modeling and robustness.

Findings

01

Consistent performance improvements across multiple benchmarks.

02

Effective modeling of multi-dimensional human preferences.

03

Robustness gained through adaptive preference weighting.

Abstract

Recent alignment methods based on Direct Preference Optimization (DPO) reformulate preference learning as supervised optimization over pairwise comparisons, offering improved efficiency and stability over reinforcement learning from human feedback (RLHF). However, existing DPO-style methods implicitly assume a single fixed preference objective, which limits their ability to model the structured and sometimes conflicting nature of real-world human judgments that span multiple preference dimensions. In this work, we propose Listwise Direct Preference Optimization ( $λ$ -DPO), a unified framework that simultaneously improves supervision granularity and preference flexibility. Instead of collapsing multi-dimensional preference signals into a single ranking, $λ$ -DPO constructs a mixture of listwise preference distributions weighted by a preference vector $λ$ on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuhui15/multi-preference-lambda-weighted-dpo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Machine Learning and Data Classification · Rough Sets and Fuzzy Logic

MethodsDirect Preference Optimization · ALIGN