Efficient and Scalable Implementation of Differentially Private Deep Learning without Shortcuts

Sebastian Rodriguez Beltran; Marlon Tobaben; Joonas J\"alk\"o; Niki Loppi; Antti Honkela

arXiv:2406.17298·cs.LG·January 14, 2026

Efficient and Scalable Implementation of Differentially Private Deep Learning without Shortcuts

Sebastian Rodriguez Beltran, Marlon Tobaben, Joonas J\"alk\"o, Niki Loppi, Antti Honkela

PDF

Open Access 1 Repo

TL;DR

This paper presents an efficient, scalable implementation of differentially private deep learning that correctly uses Poisson subsampling, demonstrating improved computational performance and scalability over naive methods.

Contribution

It introduces a new computationally efficient DP-SGD implementation with Poisson subsampling in JAX and benchmarks its performance and scalability.

Findings

01

Naive DP-SGD with Opacus is 2.6 to 8 times slower than SGD.

02

Ghost Clipping reduces DP-SGD computational cost by about half.

03

DP-SGD scales better than SGD on up to 80 GPUs.

Abstract

Differentially private stochastic gradient descent (DP-SGD) is the standard algorithm for training machine learning models under differential privacy (DP). The most common DP-SGD privacy accountants rely on Poisson subsampling to ensure the theoretical DP guarantees. Implementing computationally efficient DP-SGD with Poisson subsampling is not trivial, which leads many implementations to taking a shortcut by using computationally faster subsampling. We quantify the computational cost of training deep learning models under DP by implementing and benchmarking efficient methods with the correct Poisson subsampling. We find that using the naive implementation of DP-SGD with Opacus in PyTorch has a throughput between 2.6 and 8 times lower than that of SGD. However, efficient gradient clipping implementations like Ghost Clipping can roughly halve this cost. We propose an alternative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DPBayes/Towards-Efficient-Scalable-Training-DP-DL
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Machine Learning and Algorithms · Stochastic Gradient Optimization Techniques

MethodsGradient Clipping · Stochastic Gradient Descent