Optimal Differentially Private Model Training with Public Data

Andrew Lowy; Zeman Li; Tianjian Huang; Meisam Razaviyayn

arXiv:2306.15056·cs.LG·September 11, 2024

Optimal Differentially Private Model Training with Public Data

Andrew Lowy, Zeman Li, Tianjian Huang, Meisam Razaviyayn

PDF

Open Access 1 Repo

TL;DR

This paper characterizes the fundamental limits of differentially private model training with access to public data and introduces algorithms that optimally leverage public data to improve privacy-utility trade-offs.

Contribution

It provides tight bounds on the worst-case error of DP models with public data and develops algorithms that outperform existing methods in this setting.

Findings

01

Tight (up to log factors) bounds for mean estimation, ERM, and convex optimization under DP with public data.

02

Algorithms that match or improve upon asymptotic optimality, including constants, for local DP mean estimation.

03

Empirical results demonstrating the advantages of the proposed algorithms over state-of-the-art methods.

Abstract

Differential privacy (DP) ensures that training a machine learning model does not leak private data. In practice, we may have access to auxiliary public data that is free of privacy concerns. In this work, we assume access to a given amount of public data and settle the following fundamental open questions: 1. What is the optimal (worst-case) error of a DP model trained over a private data set while having access to side public data? 2. How can we harness public data to improve DP model training in practice? We consider these questions in both the local and central models of pure and approximate DP. To answer the first question, we prove tight (up to log factors) lower and upper bounds that characterize the optimal error rates of three fundamental problems: mean estimation, empirical risk minimization, and stochastic convex optimization. We show that the optimal error rates can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

optimization-for-data-driven-science/dp-with-public-data
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Complexity and Algorithms in Graphs