Towards Disentangled Preference Optimization Dynamics: Suppress the Loser, Preserve the Winner

Wei Chen; Yubing Wu; Junmei Yang; Delu Zeng; Qibin Zhao; John Paisley; Min Chen; Zhou Wang

arXiv:2604.18239·cs.LG·May 4, 2026

Towards Disentangled Preference Optimization Dynamics: Suppress the Loser, Preserve the Winner

Wei Chen, Yubing Wu, Junmei Yang, Delu Zeng, Qibin Zhao, John Paisley, Min Chen, Zhou Wang

PDF

1 Repo

TL;DR

This paper introduces a unified framework for preference optimization in language models, identifying conditions to suppress rejected responses while preserving preferred ones, and proposes a practical calibration method to improve training dynamics.

Contribution

It reveals a common incentive-score decomposition for different objectives and introduces the disentanglement band condition, along with a reward calibration method to enhance preference optimization.

Findings

01

Reward calibration improves disentangled training dynamics.

02

The method achieves better downstream performance.

03

The framework unifies analysis of various preference objectives.

Abstract

Preference optimization is widely used to align large language models (LLMs) with human preferences. However, many margin-based methods also suppress the chosen response when they try to suppress the rejected one, and there is no general way to prevent this across different objectives. We address this issue with a unified incentive-score decomposition of preference optimization, revealing that different objectives share the same local update directions and differ only in their scalar weights. This decomposition provides a common framework for analyzing objectives that were previously studied in separate settings. Building on this decomposition, by analyzing the dynamics of the chosen/rejected likelihoods, we identify the disentanglement band (DB), a simple, testable condition that tells us when training can follow the desired path: suppress the loser while preserving the winner,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IceyWuu/DisentangledPreferenceOptimization
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.