Listwise Direct Preference Optimization with Multi-Dimensional Preference Mixing
Yuhui Sun, Xiyao Wang, Zixi Li, YiTian Ding, Tianyang Ling, Jialuo Chen, Tianyi Yu, Zhenlong Yuan, Jinman Zhao

TL;DR
This paper introduces $$-DPO, a unified framework for preference optimization that models multi-dimensional human preferences using a mixture of listwise distributions, improving flexibility and robustness in preference learning.
Contribution
The paper proposes $$-DPO, a novel method that captures multi-dimensional preferences with a mixture model and an adaptive scheduler, enhancing preference modeling and robustness.
Findings
Consistent performance improvements across multiple benchmarks.
Effective modeling of multi-dimensional human preferences.
Robustness gained through adaptive preference weighting.
Abstract
Recent alignment methods based on Direct Preference Optimization (DPO) reformulate preference learning as supervised optimization over pairwise comparisons, offering improved efficiency and stability over reinforcement learning from human feedback (RLHF). However, existing DPO-style methods implicitly assume a single fixed preference objective, which limits their ability to model the structured and sometimes conflicting nature of real-world human judgments that span multiple preference dimensions. In this work, we propose Listwise Direct Preference Optimization (-DPO), a unified framework that simultaneously improves supervision granularity and preference flexibility. Instead of collapsing multi-dimensional preference signals into a single ranking, -DPO constructs a mixture of listwise preference distributions weighted by a preference vector on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Machine Learning and Data Classification · Rough Sets and Fuzzy Logic
MethodsDirect Preference Optimization · ALIGN
