Optimal Differentially Private Model Training with Public Data
Andrew Lowy, Zeman Li, Tianjian Huang, Meisam Razaviyayn

TL;DR
This paper characterizes the fundamental limits of differentially private model training with access to public data and introduces algorithms that optimally leverage public data to improve privacy-utility trade-offs.
Contribution
It provides tight bounds on the worst-case error of DP models with public data and develops algorithms that outperform existing methods in this setting.
Findings
Tight (up to log factors) bounds for mean estimation, ERM, and convex optimization under DP with public data.
Algorithms that match or improve upon asymptotic optimality, including constants, for local DP mean estimation.
Empirical results demonstrating the advantages of the proposed algorithms over state-of-the-art methods.
Abstract
Differential privacy (DP) ensures that training a machine learning model does not leak private data. In practice, we may have access to auxiliary public data that is free of privacy concerns. In this work, we assume access to a given amount of public data and settle the following fundamental open questions: 1. What is the optimal (worst-case) error of a DP model trained over a private data set while having access to side public data? 2. How can we harness public data to improve DP model training in practice? We consider these questions in both the local and central models of pure and approximate DP. To answer the first question, we prove tight (up to log factors) lower and upper bounds that characterize the optimal error rates of three fundamental problems: mean estimation, empirical risk minimization, and stochastic convex optimization. We show that the optimal error rates can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Complexity and Algorithms in Graphs
