Inference and Interference: The Role of Clipping, Pruning and Loss Landscapes in Differentially Private Stochastic Gradient Descent
Lauren Watson, Eric Gan, Mohan Dantam, Baharan Mirzasoleiman, Rik, Sarkar

TL;DR
This paper investigates the dynamics of DP-SGD, revealing that clipping impacts training more than noise, and demonstrates that magnitude pruning can enhance DP-SGD performance in large neural networks.
Contribution
The study provides a detailed analysis of DP-SGD's behavior, highlighting the dominant role of clipping over noise and proposing pruning as a method to improve privacy-preserving training.
Findings
Clipping has a larger impact than noise on DP-SGD performance.
Heavy pruning can improve test accuracy of DP-SGD.
Behavior in later training stages determines overall results.
Abstract
Differentially private stochastic gradient descent (DP-SGD) is known to have poorer training and test performance on large neural networks, compared to ordinary stochastic gradient descent (SGD). In this paper, we perform a detailed study and comparison of the two processes and unveil several new insights. By comparing the behavior of the two processes separately in early and late epochs, we find that while DP-SGD makes slower progress in early stages, it is the behavior in the later stages that determines the end result. This separate analysis of the clipping and noise addition steps of DP-SGD shows that while noise introduces errors to the process, gradient descent can recover from these errors when it is not clipped, and clipping appears to have a larger impact than noise. These effects are amplified in higher dimensions (large neural networks), where the loss basin occupies a lower…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Machine Learning and ELM
MethodsPruning
