Optimizer Dynamics at the Edge of Stability with Differential Privacy
Ayana Hussain, Ricky Fang

TL;DR
This paper investigates how differential privacy affects neural network training dynamics, revealing that privacy-preserving modifications alter stability patterns but some characteristic behaviors persist.
Contribution
It provides a detailed analysis of how differential privacy impacts optimizer stability and sharpness dynamics during training.
Findings
DP reduces sharpness and alters stability thresholds
Patterns of Edge of Stability persist under DP conditions
Large privacy budgets can approach classical stability limits
Abstract
Deep learning models can reveal sensitive information about individual training examples, and while differential privacy (DP) provides guarantees restricting such leakage, it also alters optimization dynamics in poorly understood ways. We study the training dynamics of neural networks under DP by comparing Gradient Descent (GD), and Adam to their privacy-preserving variants. Prior work shows that these optimizers exhibit distinct stability dynamics: full-batch methods train at the Edge of Stability (EoS), while mini-batch and adaptive methods exhibit analogous edge-of-stability behavior. At these regimes, the training loss and the sharpness--the maximum eigenvalue of the training loss Hessian--exhibit certain characteristic behavior. In DP training, per-example gradient clipping and Gaussian noise modify the update rule, and it is unclear whether these stability patterns persist. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning
