Stable Preference Optimization: A Bilevel Approach to Catastrophic Preference Shift

Chengtao Jian; Kai Yang; Tianhao Gao; Wuguang Ni; Keying Yang; Bowen Xiao; Jiajun Liu; Ye Ouyang

arXiv:2507.07723·cs.AI·January 7, 2026

Stable Preference Optimization: A Bilevel Approach to Catastrophic Preference Shift

Chengtao Jian, Kai Yang, Tianhao Gao, Wuguang Ni, Keying Yang, Bowen Xiao, Jiajun Liu, Ye Ouyang

PDF

Open Access

TL;DR

This paper identifies a critical failure mode in preference learning called Catastrophic Preference Shift, analyzes its causes, and proposes a Stable Preference Optimization framework to mitigate it, improving model alignment stability.

Contribution

The paper introduces a theoretically grounded SPO framework that constrains preference learning, addressing the limitations of existing BT-style methods and enhancing alignment reliability.

Findings

01

SPO stabilizes preference learning and improves performance.

02

Existing BT-style methods suffer from preference shift and performance degradation.

03

SPO outperforms baseline methods in empirical evaluations.

Abstract

Direct Preference Learning has emerged as a dominant offline paradigm for preference optimization. Most of these methods are based on the Bradley-Terry (BT) model for pairwise preference ranking, which directly aligns language model with human preference. Prior work has observed a counter-intuitive phenomenon termed likelihood displacement, where the absolute probability of preferred responses decreases simultaneously during training. We demonstrate that such displacement can lead to a more devastating failure mode, which we defined as \textit{Catastrophic Preference Shift}, where the lost preference probability mass inadvertently shifts toward out-of-distribution (OOD) responses. Such a failure mode is a key limitation shared across BT-style direct preference learning methods, due to the fundamental conflict between the unconstrained discriminative alignment and generative foundational…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConstraint Satisfaction and Optimization · Advanced Multi-Objective Optimization Algorithms · Multi-Criteria Decision Making