Robust LLM Alignment via Distributionally Robust Direct Preference Optimization

Zaiyan Xu; Sushil Vemuri; Kishan Panaganti; Dileep Kalathil; Rahul Jain; Deepak Ramachandran

arXiv:2502.01930·cs.LG·January 16, 2026

Robust LLM Alignment via Distributionally Robust Direct Preference Optimization

Zaiyan Xu, Sushil Vemuri, Kishan Panaganti, Dileep Kalathil, Rahul Jain, Deepak Ramachandran

PDF

Open Access

TL;DR

This paper introduces two distributionally robust algorithms, WDPO and KLDPO, to improve large language model alignment with human preferences under distribution shift, demonstrating superior performance on benchmark datasets.

Contribution

The paper develops novel distributionally robust preference optimization algorithms for LLM alignment, addressing distribution shift issues with scalable learning methods.

Findings

01

WDPO and KLDPO outperform existing methods under preference distribution shifts

02

The algorithms are scalable and suitable for large models

03

Empirical results show significant alignment improvements

Abstract

A major challenge in aligning large language models (LLMs) with human preferences is the issue of distribution shift. LLM alignment algorithms rely on static preference datasets, assuming that they accurately represent real-world user preferences. However, user preferences vary significantly across geographical regions, demographics, linguistic patterns, and evolving cultural trends. This preference distribution shift leads to catastrophic alignment failures in many real-world applications. We address this problem using the principled framework of distributionally robust optimization, and develop two novel distributionally robust direct preference optimization (DPO) algorithms, namely, Wasserstein DPO (WDPO) and Kullback-Leibler DPO (KLDPO). We characterize the sample complexity of learning the optimal policy parameters for WDPO and KLDPO. Moreover, we propose scalable gradient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Criteria Decision Making · Advanced Statistical Process Monitoring · Advanced Control Systems Optimization

MethodsDirect Preference Optimization