Improved Bounds for Private and Robust Alignment
Wenqian Weng, Yi He, Xingyu Zhou

TL;DR
This paper establishes new theoretical bounds for private and robust language model alignment, demonstrating near-optimal algorithms and improved guarantees in offline and online settings under privacy and adversarial corruption.
Contribution
It provides the first theoretical analysis of private and robust online alignment, along with improved offline bounds and new uniform convergence guarantees under privacy and corruption.
Findings
Log loss with MLE achieves near-optimal rates under privacy.
Existing offline algorithms offer stronger guarantees than previously known.
First results for private and robust online alignment.
Abstract
In this paper, we study the private and robust alignment of language models from a theoretical perspective by establishing upper bounds on the suboptimality gap in both offline and online settings. We consider preference labels subject to privacy constraints and/or adversarial corruption, and analyze two distinct interplays between them: privacy-first and corruption-first. For the privacy-only setting, we show that log loss with an MLE-style algorithm achieves near-optimal rates, in contrast to conventional wisdom. For the joint privacy-and-corruption setting, we first demonstrate that existing offline algorithms in fact provide stronger guarantees -- simultaneously in terms of corruption level and privacy parameters -- than previously known, which further yields improved bounds in the corruption-only regime. In addition, we also present the first set of results for private and robust…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Ethics and Social Impacts of AI · Mobile Crowdsensing and Crowdsourcing
