DP-MERF: Differentially Private Mean Embeddings with Random Features for Practical Privacy-Preserving Data Generation
Frederik Harder, Kamil Adamczewski, Mijung Park

TL;DR
DP-MERF introduces a novel differentially private data generation method leveraging random feature representations of kernel mean embeddings, enabling efficient privacy-preserving synthetic data generation with improved utility.
Contribution
The paper presents a new DP data generation approach using random features for kernel mean embeddings, reducing privacy costs and eliminating hyper-parameter tuning for sensitivity.
Findings
Achieves better privacy-utility trade-offs than existing methods.
Applicable to heterogeneous tabular and image data.
Requires only a single perturbation during training.
Abstract
We propose a differentially private data generation paradigm using random feature representations of kernel mean embeddings when comparing the distribution of true data with that of synthetic data. We exploit the random feature representations for two important benefits. First, we require a minimal privacy cost for training deep generative models. This is because unlike kernel-based distance metrics that require computing the kernel matrix on all pairs of true and synthetic data points, we can detach the data-dependent term from the term solely dependent on synthetic data. Hence, we need to perturb the data-dependent term only once and then use it repeatedly during the generator training. Second, we can obtain an analytic sensitivity of the kernel mean embedding as the random features are norm bounded by construction. This removes the necessity of hyper-parameter search for a clipping…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Face recognition and analysis
