A Unified Theoretical Analysis of Private and Robust Offline Alignment: from RLHF to DPO

Xingyu Zhou; Yulian Wu; Francesco Orabona

arXiv:2505.15694·cs.LG·May 22, 2025

A Unified Theoretical Analysis of Private and Robust Offline Alignment: from RLHF to DPO

Xingyu Zhou, Yulian Wu, Francesco Orabona

PDF

Open Access

TL;DR

This paper provides a unified theoretical framework analyzing how privacy and adversarial corruption affect offline alignment methods like RLHF and DPO, revealing key differences between privacy-first and corruption-first scenarios.

Contribution

It introduces a reduction framework under linear models to analyze privacy and robustness interplay, establishing a separation between LTC and CTL scenarios in offline alignment.

Findings

01

LTC is more challenging than CTL in offline alignment.

02

The reduction framework links offline alignment to logistic regression parameter estimation.

03

Advances theoretical understanding of privacy and robustness in offline alignment.

Abstract

In this paper, we theoretically investigate the effects of noisy labels in offline alignment, with a focus on the interplay between privacy and robustness against adversarial corruption. Specifically, under linear modeling assumptions, we present a unified analysis covering both reinforcement learning from human feedback (RLHF) and direct preference optimization (DPO) under different privacy-corruption scenarios, such as Local differential privacy-then-Corruption (LTC), where human preference labels are privatized before being corrupted by an adversary, and Corruption-then-Local differential privacy (CTL), where labels are corrupted before privacy protection. Our analysis leverages a reduction framework that reduces the offline alignment problem under linear modeling assumptions to parameter estimation in logistic regression. This framework allows us to establish an interesting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuction Theory and Applications

MethodsFocus