Proximalized Preference Optimization for Diverse Feedback Types: A Decomposed Perspective on DPO

Kaiyang Guo; Yinchuan Li; Zhitang Chen

arXiv:2505.23316·cs.CL·December 4, 2025

Proximalized Preference Optimization for Diverse Feedback Types: A Decomposed Perspective on DPO

Kaiyang Guo, Yinchuan Li, Zhitang Chen

PDF

Open Access

TL;DR

This paper introduces PRO, a new alignment method for large language models that effectively handles diverse feedback types and addresses the limitations of traditional contrastive alignment methods.

Contribution

The paper provides a principled decomposition of DPO, identifies the cause of likelihood underdetermination, and proposes PRO, a unified method that improves alignment with various feedback types.

Findings

01

PRO outperforms existing methods on multiple feedback types

02

Restoring the full regularizer resolves likelihood underdetermination

03

PRO demonstrates consistent improvements in empirical evaluations

Abstract

Direct alignment methods typically train large language models (LLMs) by contrasting the likelihoods of preferred and dispreferred responses. While effective at capturing relative preferences, these methods are widely observed to suppress the absolute likelihoods of example responses. As a result, aligned models can deviate from expected patterns, exhibiting rewar-hacking effect even without an explicit reward model. This fundamental limitation of contrastive alignment, which we term likelihood underdetermination, motivates us to revisit direct preference optimization (DPO) -- the seminal direct alignment method. Interestingly, we show that the DPO loss admits a principled decomposition. The reformulated loss not only extends naturally to a broader range of feedback types, but also unveils the root cause of likelihood underdetermination. Specifically, we identify that standard DPO…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConstraint Satisfaction and Optimization · Recommender Systems and Techniques · Advanced Multi-Objective Optimization Algorithms

MethodsDirect Preference Optimization · ALIGN