Membership Inference Attacks against Synthetic Data through Overfitting Detection
Boris van Breugel, Hao Sun, Zhaozhi Qian, Mihaela van der Schaar

TL;DR
This paper introduces DOMIAS, a density-based membership inference attack that effectively detects whether real data was used in training generative models, highlighting privacy risks especially for underrepresented groups.
Contribution
The work proposes a realistic MIA setting and introduces DOMIAS, a novel density-based attack that outperforms previous methods in detecting membership, particularly for uncommon samples.
Findings
DOMIAS significantly improves MIA success over prior methods.
It is especially effective on underrepresented, uncommon samples.
Provides an interpretable privacy metric for synthetic data.
Abstract
Data is the foundation of most science. Unfortunately, sharing data can be obstructed by the risk of violating data privacy, impeding research in fields like healthcare. Synthetic data is a potential solution. It aims to generate data that has the same distribution as the original data, but that does not disclose information about individuals. Membership Inference Attacks (MIAs) are a common privacy attack, in which the attacker attempts to determine whether a particular real sample was used for training of the model. Previous works that propose MIAs against generative models either display low performance -- giving the false impression that data is highly private -- or need to assume access to internal generative model parameters -- a relatively low-risk scenario, as the data publisher often only releases synthetic data, not the model. In this work we argue for a realistic MIA setting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Machine Learning in Healthcare
