Private Training & Data Generation by Clustering Embeddings
Felix Zhou, Samson Zhou, Vahab Mirrokni, Alessandro Epasto, Vincent Cohen-Addad

TL;DR
This paper presents a new differentially private method for generating synthetic image embeddings using DP clustering of GMMs, enabling high-quality private data synthesis for training neural networks.
Contribution
The authors introduce a novel DP clustering approach to generate synthetic embeddings with provable GMM learning under separation conditions, improving privacy-preserving data generation.
Findings
Achieves state-of-the-art classification accuracy with synthetic embeddings
Generates realistic synthetic images with high downstream task performance
Method is scalable and adaptable to different tasks
Abstract
Deep neural networks often use large, high-quality datasets to achieve high performance on many machine learning tasks. When training involves potentially sensitive data, this process can raise privacy concerns, as large models have been shown to unintentionally memorize and reveal sensitive information, including reconstructing entire training samples. Differential privacy (DP) provides a robust framework for protecting individual data and in particular, a new approach to privately training deep neural networks is to approximate the input dataset with a privately generated synthetic dataset, before any subsequent training algorithm. We introduce a novel principled method for DP synthetic image embedding generation, based on fitting a Gaussian Mixture Model (GMM) in an appropriate embedding space using DP clustering. Our method provably learns a GMM under separation conditions.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning
