On Design Principles for Private Adaptive Optimizers
Arun Ganesh, Brendan McMahan, Abhradeep Thakurta

TL;DR
This paper critically examines private adaptive optimizers, revealing that a simple scale-then-privatize technique outperforms existing methods in practice and offers better theoretical properties for differentially private training.
Contribution
It challenges the common belief that unbiased second moment estimates are essential, proposing and validating a simple scale-then-privatize approach with superior performance and theoretical advantages.
Findings
Scale-then-privatize outperforms other variants in language model training.
Unbiased second moment estimates are not necessary for effective private optimization.
The proposed method aligns better with correlated noise mechanisms in practice.
Abstract
The spherical noise added to gradients in differentially private (DP) training undermines the performance of adaptive optimizers like AdaGrad and Adam, and hence many recent works have proposed algorithms to address this challenge. However, the empirical results in these works focus on simple tasks and models and the conclusions may not generalize to model training in practice. In this paper we survey several of these variants, and develop better theoretical intuition for them as well as perform empirical studies comparing them. We find that a common intuition of aiming for unbiased estimates of second moments of gradients in adaptive optimizers is misguided, and instead that a simple technique called scale-then-privatize (which does not achieve unbiased second moments) has more desirable theoretical behaviors and outperforms all other variants we study on a small-scale language model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Advanced Optimization Algorithms Research
