Towards Robust Alignment of Language Models: Distributionally   Robustifying Direct Preference Optimization

Junkang Wu; Yuexiang Xie; Zhengyi Yang; Jiancan Wu; Jiawei Chen,; Jinyang Gao; Bolin Ding; Xiang Wang; Xiangnan He

arXiv:2407.07880·cs.LG·April 21, 2025

Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization

Junkang Wu, Yuexiang Xie, Zhengyi Yang, Jiancan Wu, Jiawei Chen,, Jinyang Gao, Bolin Ding, Xiang Wang, Xiangnan He

PDF

Open Access 1 Repo

TL;DR

This paper introduces Dr. DPO, a robust training method for LLMs that mitigates noise in preference data by leveraging distributionally robust optimization principles, leading to improved text quality and response accuracy.

Contribution

It presents a novel framework, Dr. DPO, that extends DPO with distributionally robust optimization to handle both pointwise and pairwise noise in preference datasets.

Findings

01

Dr. DPO significantly improves text quality and response accuracy.

02

Theoretical analysis shows DPO's inherent robustness to pointwise noise.

03

Empirical results demonstrate enhanced performance in noisy environments.

Abstract

This study addresses the challenge of noise in training datasets for Direct Preference Optimization (DPO), a method for aligning Large Language Models (LLMs) with human preferences. We categorize noise into pointwise noise, which includes low-quality data points, and pairwise noise, which encompasses erroneous data pair associations that affect preference rankings. Utilizing Distributionally Robust Optimization (DRO), we enhance DPO's resilience to these types of noise. Our theoretical insights reveal that DPO inherently embeds DRO principles, conferring robustness to pointwise noise, with the regularization coefficient $β$ playing a critical role in its noise resistance. Extending this framework, we introduce Distributionally Robustifying DPO (Dr. DPO), which integrates pairwise robustness by optimizing against worst-case pairwise scenarios. The novel hyperparameter $β^{'}$ in Dr.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

junkangwu/dr_dpo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech and dialogue systems

MethodsDirect Preference Optimization