Privacy for Free: How does Dataset Condensation Help Privacy?
Tian Dong, Bo Zhao, Lingjuan Lyu

TL;DR
This paper demonstrates that dataset condensation can simultaneously improve training efficiency and provide privacy guarantees, serving as a cost-effective alternative to traditional private data generators.
Contribution
It introduces the novel idea that dataset condensation inherently offers privacy benefits and provides theoretical and empirical evidence supporting this claim.
Findings
Theoretically shows limited impact of individual samples on model parameters in DC.
Empirically validates privacy protection against membership inference attacks.
Establishes DC as a privacy-preserving data generation method.
Abstract
To prevent unintentional data leakage, research community has resorted to data generators that can produce differentially private data for model training. However, for the sake of the data privacy, existing solutions suffer from either expensive training cost or poor generalization performance. Therefore, we raise the question whether training efficiency and privacy can be achieved simultaneously. In this work, we for the first time identify that dataset condensation (DC) which is originally designed for improving training efficiency is also a better solution to replace the traditional data generators for private data generation, thus providing privacy for free. To demonstrate the privacy benefit of DC, we build a connection between DC and differential privacy, and theoretically prove on linear feature extractors (and then extended to non-linear feature extractors) that the existence of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning
