Improving Noise Efficiency in Privacy-preserving Dataset Distillation
Runkai Zheng, Vishnu Asutosh Dasu, Yinong Oliver Wang, Haohan Wang, Fernando De la Torre

TL;DR
This paper proposes a new method for privacy-preserving dataset distillation that improves efficiency and accuracy by decoupling sampling and optimization and reducing the impact of differential privacy noise, leading to better synthetic datasets.
Contribution
The authors introduce a novel framework that enhances private dataset distillation by decoupling sampling from optimization and mitigating DP noise through subspace matching, improving convergence and signal quality.
Findings
Achieves 10.0% improvement on CIFAR-10 with 50 images per class.
Increases performance by 8.3% with one-fifth the dataset size of previous methods.
Demonstrates significant advancement in privacy-preserving dataset distillation.
Abstract
Modern machine learning models heavily rely on large datasets that often include sensitive and private information, raising serious privacy concerns. Differentially private (DP) data generation offers a solution by creating synthetic datasets that limit the leakage of private information within a predefined privacy budget; however, it requires a substantial amount of data to achieve performance comparable to models trained on the original data. To mitigate the significant expense incurred with synthetic data generation, Dataset Distillation (DD) stands out for its remarkable training and storage efficiency. This efficiency is particularly advantageous when integrated with DP mechanisms, curating compact yet informative synthetic datasets without compromising privacy. However, current state-of-the-art private DD methods suffer from a synchronized sampling-optimization process and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Stochastic Gradient Optimization Techniques
