Doubly Robust Alignment for Large Language Models

Erhan Xu; Kai Ye; Hongyi Zhou; Luhan Zhu; Francesco Quinzan; Chengchun Shi

arXiv:2506.01183·cs.LG·October 30, 2025

Doubly Robust Alignment for Large Language Models

Erhan Xu, Kai Ye, Hongyi Zhou, Luhan Zhu, Francesco Quinzan, Chengchun Shi

PDF

Open Access 3 Models 1 Video

TL;DR

This paper introduces a doubly robust preference optimization method for reinforcement learning from human feedback, improving robustness and consistency in aligning large language models with human preferences, even under model misspecification.

Contribution

It proposes a novel doubly robust algorithm that remains effective when either the preference model or reference policy is correctly specified, enhancing robustness over existing methods.

Findings

01

Outperforms state-of-the-art algorithms in robustness and accuracy.

02

Theoretically guarantees performance under model misspecification.

03

Demonstrates superior practical results in aligning language models.

Abstract

This paper studies reinforcement learning from human feedback (RLHF) for aligning large language models with human preferences. While RLHF has demonstrated promising results, many algorithms are highly sensitive to misspecifications in the underlying preference model (e.g., the Bradley-Terry model), the reference policy, or the reward function, resulting in undesirable fine-tuning. To address model misspecification, we propose a doubly robust preference optimization algorithm that remains consistent when either the preference model or the reference policy is correctly specified (without requiring both). Our proposal demonstrates superior and more robust performance than state-of-the-art algorithms, both in theory and in practice. The code is available at https://github.com/DRPO4LLM/DRPO4LLM

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

Doubly Robust Alignment for Large Language Models· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling