Synthetic is all you need: removing the auxiliary data assumption for membership inference attacks against synthetic data
Florent Gu\'epin, Matthieu Meeus, Ana-Maria Cretu, Yves-Alexandre, de Montjoye

TL;DR
This paper demonstrates that membership inference attacks on synthetic data can be effectively performed without auxiliary datasets, broadening the practical applicability of privacy evaluations.
Contribution
It introduces methods to perform MIAs using only synthetic data, removing the strong assumption of auxiliary dataset access, and validates their effectiveness across multiple scenarios and datasets.
Findings
MIAs remain successful without auxiliary data.
Synthetic data privacy can be assessed more realistically.
Attacks perform well across different datasets and generators.
Abstract
Synthetic data is emerging as one of the most promising solutions to share individual-level data while safeguarding privacy. While membership inference attacks (MIAs), based on shadow modeling, have become the standard to evaluate the privacy of synthetic data, they currently assume the attacker to have access to an auxiliary dataset sampled from a similar distribution as the training dataset. This is often seen as a very strong assumption in practice, especially as the proposed main use cases for synthetic tabular data (e.g. medical data, financial transactions) are very specific and don't have any reference datasets directly available. We here show how this assumption can be removed, allowing for MIAs to be performed using only the synthetic data. Specifically, we developed three different scenarios: (S1) Black-box access to the generator, (S2) only access to the released synthetic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Internet Traffic Analysis and Secure E-voting · Cryptography and Data Security
