Direct Preference Optimization With Unobserved Preference Heterogeneity: The Necessity of Ternary Preferences

Keertana Chidambaram; Karthik Vinay Seetharaman; Vasilis Syrgkanis

arXiv:2405.15065·cs.LG·October 21, 2025

Direct Preference Optimization With Unobserved Preference Heterogeneity: The Necessity of Ternary Preferences

Keertana Chidambaram, Karthik Vinay Seetharaman, Vasilis Syrgkanis

PDF

Open Access

TL;DR

This paper enhances language model alignment by addressing unobserved preference heterogeneity and the limitations of binary preference data, proposing methods for better preference identification and personalized fairness.

Contribution

It introduces a theoretical framework and algorithms for incorporating diverse user preferences and fairness in language model alignment, extending beyond binary comparisons.

Findings

01

Rankings over three or more responses ensure preference identifiability.

02

An EM-based method discovers latent annotator types for personalized models.

03

A min-max regret aggregation guarantees equitable performance.

Abstract

Reinforcement Learning from Human Feedback (RLHF) has become central to aligning large language models with human values, typically by first learning a reward model from preference data which is then used to update the model with reinforcement learning. Recent alternatives such as Direct Preference Optimization (DPO) simplify this pipeline by directly optimizing on preferences. However, both approaches often assume uniform annotator preferences and rely on binary comparisons, overlooking two key limitations: the diversity of human evaluators and the limitations of pairwise feedback. In this work, we address both these issues. First, we connect preference learning in RLHF with the econometrics literature and show that binary comparisons are insufficient for identifying latent user preferences from finite user data and infinite users, while (even incomplete) rankings over three or more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Criteria Decision Making

MethodsDirect Preference Optimization · ALIGN