Large Scale Transfer Learning for Differentially Private Image Classification
Harsh Mehta, Abhradeep Thakurta, Alexey Kurakin, Ashok Cutkosky

TL;DR
This paper demonstrates that large-scale pre-training and careful optimizer choice significantly improve differentially private image classification performance on ImageNet, achieving state-of-the-art results with reduced computational cost.
Contribution
It introduces a method combining large-scale pre-training, optimizer tuning, and minimal fine-tuning to enhance DP image classification, achieving new state-of-the-art results.
Findings
Pre-training on large datasets improves DP model utility.
Using LAMB optimizer with DP-SGD boosts performance by up to 20 percentage points.
Single-step last-layer fine-tuning with small initialization achieves SOTA results.
Abstract
Differential Privacy (DP) provides a formal framework for training machine learning models with individual example level privacy. In the field of deep learning, Differentially Private Stochastic Gradient Descent (DP-SGD) has emerged as a popular private training algorithm. Unfortunately, the computational cost of training large-scale models with DP-SGD is substantially higher than non-private training. This is further exacerbated by the fact that increasing the number of parameters leads to larger degradation in utility with DP. In this work, we zoom in on the ImageNet dataset and demonstrate that, similar to the non-private case, pre-training over-parameterized models on a large public dataset can lead to substantial gains when the model is finetuned privately. Moreover, by systematically comparing private and non-private models across a range of large batch sizes, we find that similar…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning
MethodsAdam · LAMB
