On Design Principles for Private Adaptive Optimizers

Arun Ganesh; Brendan McMahan; Abhradeep Thakurta

arXiv:2507.01129·cs.LG·July 3, 2025

On Design Principles for Private Adaptive Optimizers

Arun Ganesh, Brendan McMahan, Abhradeep Thakurta

PDF

Open Access

TL;DR

This paper critically examines private adaptive optimizers, revealing that a simple scale-then-privatize technique outperforms existing methods in practice and offers better theoretical properties for differentially private training.

Contribution

It challenges the common belief that unbiased second moment estimates are essential, proposing and validating a simple scale-then-privatize approach with superior performance and theoretical advantages.

Findings

01

Scale-then-privatize outperforms other variants in language model training.

02

Unbiased second moment estimates are not necessary for effective private optimization.

03

The proposed method aligns better with correlated noise mechanisms in practice.

Abstract

The spherical noise added to gradients in differentially private (DP) training undermines the performance of adaptive optimizers like AdaGrad and Adam, and hence many recent works have proposed algorithms to address this challenge. However, the empirical results in these works focus on simple tasks and models and the conclusions may not generalize to model training in practice. In this paper we survey several of these variants, and develop better theoretical intuition for them as well as perform empirical studies comparing them. We find that a common intuition of aiming for unbiased estimates of second moments of gradients in adaptive optimizers is misguided, and instead that a simple technique called scale-then-privatize (which does not achieve unbiased second moments) has more desirable theoretical behaviors and outperforms all other variants we study on a small-scale language model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Optimization Algorithms Research