Robust Preference Optimization via Dynamic Target Margins
Jie Sun, Junkang Wu, Jiancan Wu, Zhibo Zhu, Xingyu Lu, Jun Zhou, Lintao Ma, Xiang Wang

TL;DR
This paper introduces $oldsymbol{ extgamma}$-PO, a dynamic margin preference optimization method that improves Large Language Model alignment by adaptively calibrating reward margins at the pair level, boosting performance with minimal efficiency loss.
Contribution
The paper proposes $oldsymbol{ extgamma}$-PO, a novel, plug-and-play dynamic margin calibration technique that enhances preference optimization for LLMs by effectively handling noisy data.
Findings
Achieves 4.4% average improvement on benchmarks
Sets new state-of-the-art performance for LLM alignment
Requires minimal code changes and maintains training efficiency
Abstract
The alignment of Large Language Models (LLMs) is crucial for ensuring their safety and reliability in practical applications. Direct Preference Optimization (DPO) has emerged as an efficient method that directly optimizes models using preference pairs, significantly reducing resource demands. However, the effectiveness of DPO heavily depends on the data quality, which is frequently compromised by noise. In this work, we propose -PO, a dynamic target margin preference optimization algorithm that adjust reward margins at the pairwise level. By introducing instance-specific margin calibration, -PO strategically prioritizes high-confidence pairs (those demonstrating higher reward margins) while suppressing potential noise from ambiguous pairs. Moreover, -PO is a plug-and-play method, compatible with variants of DPO that rely on reward margin between preference pairs.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI) · Text and Document Classification Technologies
