Unified Preference Optimization: Language Model Alignment Beyond the Preference Frontier
Anirudhan Badrinath, Prabhat Agarwal, Jiajing Xu

TL;DR
This paper introduces Unified Preference Optimization, a novel method that combines the simplicity of direct preference optimization with the flexibility of reinforcement learning, enabling better alignment of language models to user and designer preferences without extra data or instability.
Contribution
It proposes a unified framework that allows tuning language models to optimize both user and designer preferences simultaneously, overcoming limitations of existing methods like DPO and RLHF.
Findings
Effective generalization to user preferences and auxiliary objectives.
Preserves or surpasses alignment performance on benchmarks.
No additional preference data or training stability issues.
Abstract
For aligning large language models (LLMs), prior work has leveraged reinforcement learning via human feedback (RLHF) or variations of direct preference optimization (DPO). While DPO offers a simpler framework based on maximum likelihood estimation, it compromises on the ability to easily tune language models to maximize auxiliary, non-preferential objectives according to the LLM designer's preferences (e.g., tuning lexical style or minimizing specific kinds of harmful content). Critically, these designer objectives may not be amply human-labeled or represented in available data, align with user preferences, or even be able to be captured tractably by binary preference pairs. To leverage the simplicity and performance of DPO with the generality of RL, we propose a unified approach. Based on a simple decomposition of preference and auxiliary objectives, we allow for tuning LLMs to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEconomic and Environmental Valuation
MethodsDirect Preference Optimization · Sparse Evolutionary Training · ALIGN
