Efficient and Scalable Implementation of Differentially Private Deep Learning without Shortcuts
Sebastian Rodriguez Beltran, Marlon Tobaben, Joonas J\"alk\"o, Niki Loppi, Antti Honkela

TL;DR
This paper presents an efficient, scalable implementation of differentially private deep learning that correctly uses Poisson subsampling, demonstrating improved computational performance and scalability over naive methods.
Contribution
It introduces a new computationally efficient DP-SGD implementation with Poisson subsampling in JAX and benchmarks its performance and scalability.
Findings
Naive DP-SGD with Opacus is 2.6 to 8 times slower than SGD.
Ghost Clipping reduces DP-SGD computational cost by about half.
DP-SGD scales better than SGD on up to 80 GPUs.
Abstract
Differentially private stochastic gradient descent (DP-SGD) is the standard algorithm for training machine learning models under differential privacy (DP). The most common DP-SGD privacy accountants rely on Poisson subsampling to ensure the theoretical DP guarantees. Implementing computationally efficient DP-SGD with Poisson subsampling is not trivial, which leads many implementations to taking a shortcut by using computationally faster subsampling. We quantify the computational cost of training deep learning models under DP by implementing and benchmarking efficient methods with the correct Poisson subsampling. We find that using the naive implementation of DP-SGD with Opacus in PyTorch has a throughput between 2.6 and 8 times lower than that of SGD. However, efficient gradient clipping implementations like Ghost Clipping can roughly halve this cost. We propose an alternative…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques
MethodsGradient Clipping · Stochastic Gradient Descent
