$\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$

Junkang Wu; Yuexiang Xie; Zhengyi Yang; Jiancan Wu; Jinyang Gao; Bolin; Ding; Xiang Wang; Xiangnan He

arXiv:2407.08639·cs.AI·October 15, 2024

$\beta$-DPO: Direct Preference Optimization with Dynamic $\beta$

Junkang Wu, Yuexiang Xie, Zhengyi Yang, Jiancan Wu, Jinyang Gao, Bolin, Ding, Xiang Wang, Xiangnan He

PDF

Open Access 1 Repo

TL;DR

This paper introduces a dynamic $eta$ calibration method for Direct Preference Optimization, improving LLM alignment with human preferences by adapting to data quality and filtering out outliers.

Contribution

It proposes a novel framework that dynamically adjusts $eta$ during training and incorporates data filtering, enhancing DPO's robustness and performance.

Findings

01

Dynamic $eta$ improves model alignment with preferences.

02

Data filtering reduces the impact of outliers.

03

Significant performance gains across models and datasets.

Abstract

Direct Preference Optimization (DPO) has emerged as a compelling approach for training Large Language Models (LLMs) to adhere to human preferences. However, the performance of DPO is sensitive to the fine-tuning of its trade-off parameter $β$ , as well as to the quality of the preference data. We analyze the impact of $β$ and data quality on DPO, uncovering that optimal $β$ values vary with the informativeness of pairwise data. Addressing the limitations of static $β$ values, we introduce a novel framework that dynamically calibrates $β$ at the batch level, informed by data quality considerations. Additionally, our method incorporates $β$ -guided data filtering to safeguard against the influence of outliers. Through empirical evaluation, we demonstrate that our dynamic $β$ adjustment technique significantly improves DPO's performance across a range of models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

junkangwu/beta-dpo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsConstraint Satisfaction and Optimization · Data Management and Algorithms · Rough Sets and Fuzzy Logic

MethodsDirect Preference Optimization