Membership Inference Attacks against Synthetic Data through Overfitting   Detection

Boris van Breugel; Hao Sun; Zhaozhi Qian; Mihaela van der Schaar

arXiv:2302.12580·cs.LG·February 27, 2023·25 cites

Membership Inference Attacks against Synthetic Data through Overfitting Detection

Boris van Breugel, Hao Sun, Zhaozhi Qian, Mihaela van der Schaar

PDF

Open Access 1 Repo

TL;DR

This paper introduces DOMIAS, a density-based membership inference attack that effectively detects whether real data was used in training generative models, highlighting privacy risks especially for underrepresented groups.

Contribution

The work proposes a realistic MIA setting and introduces DOMIAS, a novel density-based attack that outperforms previous methods in detecting membership, particularly for uncommon samples.

Findings

01

DOMIAS significantly improves MIA success over prior methods.

02

It is especially effective on underrepresented, uncommon samples.

03

Provides an interpretable privacy metric for synthetic data.

Abstract

Data is the foundation of most science. Unfortunately, sharing data can be obstructed by the risk of violating data privacy, impeding research in fields like healthcare. Synthetic data is a potential solution. It aims to generate data that has the same distribution as the original data, but that does not disclose information about individuals. Membership Inference Attacks (MIAs) are a common privacy attack, in which the attacker attempts to determine whether a particular real sample was used for training of the model. Previous works that propose MIAs against generative models either display low performance -- giving the false impression that data is highly private -- or need to assume access to internal generative model parameters -- a relatively low-risk scenario, as the data publisher often only releases synthetic data, not the model. In this work we argue for a realistic MIA setting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vanderschaarlab/domias
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Machine Learning in Healthcare