Unified Preference Optimization: Language Model Alignment Beyond the Preference Frontier

Anirudhan Badrinath; Prabhat Agarwal; Jiajing Xu

arXiv:2405.17956·cs.AI·May 27, 2025

Unified Preference Optimization: Language Model Alignment Beyond the Preference Frontier

Anirudhan Badrinath, Prabhat Agarwal, Jiajing Xu

PDF

Open Access

TL;DR

This paper introduces Unified Preference Optimization, a novel method that combines the simplicity of direct preference optimization with the flexibility of reinforcement learning, enabling better alignment of language models to user and designer preferences without extra data or instability.

Contribution

It proposes a unified framework that allows tuning language models to optimize both user and designer preferences simultaneously, overcoming limitations of existing methods like DPO and RLHF.

Findings

01

Effective generalization to user preferences and auxiliary objectives.

02

Preserves or surpasses alignment performance on benchmarks.

03

No additional preference data or training stability issues.

Abstract

For aligning large language models (LLMs), prior work has leveraged reinforcement learning via human feedback (RLHF) or variations of direct preference optimization (DPO). While DPO offers a simpler framework based on maximum likelihood estimation, it compromises on the ability to easily tune language models to maximize auxiliary, non-preferential objectives according to the LLM designer's preferences (e.g., tuning lexical style or minimizing specific kinds of harmful content). Critically, these designer objectives may not be amply human-labeled or represented in available data, align with user preferences, or even be able to be captured tractably by binary preference pairs. To leverage the simplicity and performance of DPO with the generality of RL, we propose a unified approach. Based on a simple decomposition of preference and auxiliary objectives, we allow for tuning LLMs to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEconomic and Environmental Valuation

MethodsDirect Preference Optimization · Sparse Evolutionary Training · ALIGN