Robust LLM Alignment via Distributionally Robust Direct Preference Optimization
Zaiyan Xu, Sushil Vemuri, Kishan Panaganti, Dileep Kalathil, Rahul Jain, Deepak Ramachandran

TL;DR
This paper introduces two distributionally robust algorithms, WDPO and KLDPO, to improve large language model alignment with human preferences under distribution shift, demonstrating superior performance on benchmark datasets.
Contribution
The paper develops novel distributionally robust preference optimization algorithms for LLM alignment, addressing distribution shift issues with scalable learning methods.
Findings
WDPO and KLDPO outperform existing methods under preference distribution shifts
The algorithms are scalable and suitable for large models
Empirical results show significant alignment improvements
Abstract
A major challenge in aligning large language models (LLMs) with human preferences is the issue of distribution shift. LLM alignment algorithms rely on static preference datasets, assuming that they accurately represent real-world user preferences. However, user preferences vary significantly across geographical regions, demographics, linguistic patterns, and evolving cultural trends. This preference distribution shift leads to catastrophic alignment failures in many real-world applications. We address this problem using the principled framework of distributionally robust optimization, and develop two novel distributionally robust direct preference optimization (DPO) algorithms, namely, Wasserstein DPO (WDPO) and Kullback-Leibler DPO (KLDPO). We characterize the sample complexity of learning the optimal policy parameters for WDPO and KLDPO. Moreover, we propose scalable gradient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Criteria Decision Making · Advanced Statistical Process Monitoring · Advanced Control Systems Optimization
MethodsDirect Preference Optimization
