Improved Bounds for Private and Robust Alignment

Wenqian Weng; Yi He; Xingyu Zhou

arXiv:2512.23816·cs.LG·January 1, 2026

Improved Bounds for Private and Robust Alignment

Wenqian Weng, Yi He, Xingyu Zhou

PDF

Open Access

TL;DR

This paper establishes new theoretical bounds for private and robust language model alignment, demonstrating near-optimal algorithms and improved guarantees in offline and online settings under privacy and adversarial corruption.

Contribution

It provides the first theoretical analysis of private and robust online alignment, along with improved offline bounds and new uniform convergence guarantees under privacy and corruption.

Findings

01

Log loss with MLE achieves near-optimal rates under privacy.

02

Existing offline algorithms offer stronger guarantees than previously known.

03

First results for private and robust online alignment.

Abstract

In this paper, we study the private and robust alignment of language models from a theoretical perspective by establishing upper bounds on the suboptimality gap in both offline and online settings. We consider preference labels subject to privacy constraints and/or adversarial corruption, and analyze two distinct interplays between them: privacy-first and corruption-first. For the privacy-only setting, we show that log loss with an MLE-style algorithm achieves near-optimal rates, in contrast to conventional wisdom. For the joint privacy-and-corruption setting, we first demonstrate that existing offline algorithms in fact provide stronger guarantees -- simultaneously in terms of corruption level and privacy parameters -- than previously known, which further yields improved bounds in the corruption-only regime. In addition, we also present the first set of results for private and robust…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Ethics and Social Impacts of AI · Mobile Crowdsensing and Crowdsourcing