Explicit Preference Optimization: No Need for an Implicit Reward Model

Xiangkun Hu; Lemin Kong; Tong He; David Wipf

arXiv:2506.07492·cs.LG·June 10, 2025

Explicit Preference Optimization: No Need for an Implicit Reward Model

Xiangkun Hu, Lemin Kong, Tong He, David Wipf

PDF

Open Access 1 Repo

TL;DR

This paper introduces EXPO, an explicit preference optimization framework for training large language models that avoids the pitfalls of implicit reward reparameterizations used in prior methods like DPO, leading to more transparent and effective preference alignment.

Contribution

The paper proposes EXPO, a novel explicit preference optimization method that eliminates the need for reparameterized implicit rewards, addressing limitations of existing DPO-based approaches.

Findings

01

EXPO outperforms DPO in preference alignment tasks.

02

EXPO demonstrates more transparent regularization and avoids counter-intuitive behaviors.

03

Empirical results validate the theoretical advantages of EXPO over implicit reward methods.

Abstract

The generated responses of large language models (LLMs) are often fine-tuned to human preferences through a process called reinforcement learning from human feedback (RLHF). As RLHF relies on a challenging training sequence, whereby a separate reward model is independently learned and then later applied to LLM policy updates, ongoing research effort has targeted more straightforward alternatives. In this regard, direct preference optimization (DPO) and its many offshoots circumvent the need for a separate reward training step. Instead, through the judicious use of a reparameterization trick that induces an \textit{implicit} reward, DPO and related methods consolidate learning to the minimization of a single loss function. And yet despite demonstrable success in some real-world settings, we prove that DPO-based objectives are nonetheless subject to sub-optimal regularization and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lmkong020/explicit-preference-optimization
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Topic Modeling · Recommender Systems and Techniques

MethodsDirect Preference Optimization