Provably Robust DPO: Aligning Language Models with Noisy Feedback

Sayak Ray Chowdhury; Anush Kini; Nagarajan Natarajan

arXiv:2403.00409·cs.LG·April 15, 2024·1 cites

Provably Robust DPO: Aligning Language Models with Noisy Feedback

Sayak Ray Chowdhury, Anush Kini, Nagarajan Natarajan

PDF

Open Access

TL;DR

This paper introduces a theoretically grounded robust preference optimization (rDPO) method that effectively mitigates the impact of noisy preference data in aligning language models with human interests, backed by formal guarantees and empirical validation.

Contribution

It proposes a novel loss function for policy optimization that is robust to noisy preferences and provides theoretical bounds on its sub-optimality gap under certain assumptions.

Findings

01

rDPO outperforms vanilla DPO in noisy settings

02

Theoretical sub-optimality gap scales with noise level and data size

03

Empirical results confirm robustness to preference label noise

Abstract

Learning from preference-based feedback has recently gained traction as a promising approach to align language models with human interests. While these aligned generative models have demonstrated impressive capabilities across various tasks, their dependence on high-quality human preference data poses a bottleneck in practical applications. Specifically, noisy (incorrect and ambiguous) preference pairs in the dataset might restrict the language models from capturing human intent accurately. While practitioners have recently proposed heuristics to mitigate the effect of noisy preferences, a complete theoretical understanding of their workings remain elusive. In this work, we aim to bridge this gap by by introducing a general framework for policy optimization in the presence of random preference flips. We focus on the direct preference optimization (DPO) algorithm in particular since it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Speech and dialogue systems

MethodsDirect Preference Optimization · Focus · Shrink and Fine-Tune · FLIP · ALIGN