Differentially Private Optimization for Non-Decomposable Objective Functions
Weiwei Kong, Andr\'es Mu\~noz Medina, M\'onica Ribero

TL;DR
This paper introduces a new differentially private stochastic gradient descent (DP-SGD) method tailored for similarity-based loss functions like contrastive loss, effectively controlling gradient sensitivity and enabling privacy-preserving unsupervised pre-training.
Contribution
We develop a novel DP-SGD variant that manipulates gradients to achieve batch size-independent sensitivity for contrastive loss functions, improving privacy-utility trade-offs.
Findings
Performance close to non-private models on CIFAR-10 and CIFAR-100.
Our method outperforms standard DP-SGD on contrastive loss.
Gradient sensitivity remains bounded regardless of batch size.
Abstract
Unsupervised pre-training is a common step in developing computer vision models and large language models. In this setting, the absence of labels requires the use of similarity-based loss functions, such as contrastive loss, that favor minimizing the distance between similar inputs and maximizing the distance between distinct inputs. As privacy concerns mount, training these models using differential privacy has become more important. However, due to how inputs are generated for these losses, one of their undesirable properties is that their sensitivity grows with the batch size. This property is particularly disadvantageous for differentially private training methods, such as DP-SGD. To overcome this issue, we develop a new DP-SGD variant for similarity based loss functions -- in particular, the commonly-used contrastive loss -- that manipulates gradients of the objective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques
