Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization
Junkang Wu, Yuexiang Xie, Zhengyi Yang, Jiancan Wu, Jiawei Chen,, Jinyang Gao, Bolin Ding, Xiang Wang, Xiangnan He

TL;DR
This paper introduces Dr. DPO, a robust training method for LLMs that mitigates noise in preference data by leveraging distributionally robust optimization principles, leading to improved text quality and response accuracy.
Contribution
It presents a novel framework, Dr. DPO, that extends DPO with distributionally robust optimization to handle both pointwise and pairwise noise in preference datasets.
Findings
Dr. DPO significantly improves text quality and response accuracy.
Theoretical analysis shows DPO's inherent robustness to pointwise noise.
Empirical results demonstrate enhanced performance in noisy environments.
Abstract
This study addresses the challenge of noise in training datasets for Direct Preference Optimization (DPO), a method for aligning Large Language Models (LLMs) with human preferences. We categorize noise into pointwise noise, which includes low-quality data points, and pairwise noise, which encompasses erroneous data pair associations that affect preference rankings. Utilizing Distributionally Robust Optimization (DRO), we enhance DPO's resilience to these types of noise. Our theoretical insights reveal that DPO inherently embeds DRO principles, conferring robustness to pointwise noise, with the regularization coefficient playing a critical role in its noise resistance. Extending this framework, we introduce Distributionally Robustifying DPO (Dr. DPO), which integrates pairwise robustness by optimizing against worst-case pairwise scenarios. The novel hyperparameter in Dr.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems
MethodsDirect Preference Optimization
