Proximalized Preference Optimization for Diverse Feedback Types: A Decomposed Perspective on DPO
Kaiyang Guo, Yinchuan Li, Zhitang Chen

TL;DR
This paper introduces PRO, a new alignment method for large language models that effectively handles diverse feedback types and addresses the limitations of traditional contrastive alignment methods.
Contribution
The paper provides a principled decomposition of DPO, identifies the cause of likelihood underdetermination, and proposes PRO, a unified method that improves alignment with various feedback types.
Findings
PRO outperforms existing methods on multiple feedback types
Restoring the full regularizer resolves likelihood underdetermination
PRO demonstrates consistent improvements in empirical evaluations
Abstract
Direct alignment methods typically train large language models (LLMs) by contrasting the likelihoods of preferred and dispreferred responses. While effective at capturing relative preferences, these methods are widely observed to suppress the absolute likelihoods of example responses. As a result, aligned models can deviate from expected patterns, exhibiting rewar-hacking effect even without an explicit reward model. This fundamental limitation of contrastive alignment, which we term likelihood underdetermination, motivates us to revisit direct preference optimization (DPO) -- the seminal direct alignment method. Interestingly, we show that the DPO loss admits a principled decomposition. The reformulated loss not only extends naturally to a broader range of feedback types, but also unveils the root cause of likelihood underdetermination. Specifically, we identify that standard DPO…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConstraint Satisfaction and Optimization · Recommender Systems and Techniques · Advanced Multi-Objective Optimization Algorithms
MethodsDirect Preference Optimization · ALIGN
