Privacy for Free: How does Dataset Condensation Help Privacy?

Tian Dong; Bo Zhao; Lingjuan Lyu

arXiv:2206.00240·cs.CR·June 2, 2022·31 cites

Privacy for Free: How does Dataset Condensation Help Privacy?

Tian Dong, Bo Zhao, Lingjuan Lyu

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that dataset condensation can simultaneously improve training efficiency and provide privacy guarantees, serving as a cost-effective alternative to traditional private data generators.

Contribution

It introduces the novel idea that dataset condensation inherently offers privacy benefits and provides theoretical and empirical evidence supporting this claim.

Findings

01

Theoretically shows limited impact of individual samples on model parameters in DC.

02

Empirically validates privacy protection against membership inference attacks.

03

Establishes DC as a privacy-preserving data generation method.

Abstract

To prevent unintentional data leakage, research community has resorted to data generators that can produce differentially private data for model training. However, for the sake of the data privacy, existing solutions suffer from either expensive training cost or poor generalization performance. Therefore, we raise the question whether training efficiency and privacy can be achieved simultaneously. In this work, we for the first time identify that dataset condensation (DC) which is originally designed for improving training efficiency is also a better solution to replace the traditional data generators for private data generation, thus providing privacy for free. To demonstrate the privacy benefit of DC, we build a connection between DC and differential privacy, and theoretically prove on linear feature extractors (and then extended to non-linear feature extractors) that the existence of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Guang000/Awesome-Dataset-Distillation
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning