DPO Unchained: Your Training Algorithm is Secretly Disentangled in Human Choice Theory

Wenxuan Zhou; Shujian Zhang; Brice Magdalou; John Lambert; Ehsan Amid; Richard Nock; Andrew Hard

arXiv:2507.07855·cs.LG·February 5, 2026

DPO Unchained: Your Training Algorithm is Secretly Disentangled in Human Choice Theory

Wenxuan Zhou, Shujian Zhang, Brice Magdalou, John Lambert, Ehsan Amid, Richard Nock, Andrew Hard

PDF

TL;DR

This paper broadens the theoretical foundation of Direct Preference Optimization (DPO) by linking it to human choice theory, revealing new insights and extensions for preference-based machine learning.

Contribution

It generalizes DPO within a normative human choice framework, enabling support for non-convex losses and embedding various ML choices within human choice models.

Findings

01

Supports non-convex loss functions in DPO

02

Any ML analytical choice can be embedded with human choice models

03

Provides a normative framework for DPO extensions

Abstract

Normative theories allow one to elicit key parts of a ML algorithm from first principles, which is crucial at a time of championed scrutiny for ML work. Direct Preference Optimization (DPO) cleverly bypasses reward modeling by making an explicit link with a specific normative model of human choice. Our paper elevates this connection to the full generality of DPO's normative framework. Getting there requires reworking human choice theory's textbook path for a better RLHF/ML fit. It elevates the connection to a remarkably broad viewpoint on preference optimization, considering the current panorama of DPO follow-ups. It also unveils unexpected riches for ML, chief among which the support for non-convex losses, the fact that any compliant ML analytical choice can be embedded with any human choice model, and a normative framework's umbrella wide enough to safeguard DPO's extensions (margins,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.