TL;DR
ReMIA is a practical and efficient membership inference attack against synthetic data generators, requiring minimal training runs and auxiliary data, with sensitivity comparable to state-of-the-art methods.
Contribution
ReMIA introduces a new privacy metric that is more practical than existing MIAs, using only two training runs and small auxiliary data, while maintaining high sensitivity.
Findings
ReMIA achieves sensitivity comparable to state-of-the-art MIAs.
ReMIA requires only two SDG training runs and small auxiliary data.
SDGs can balance privacy and utility better than traditional methods.
Abstract
Tabular data sharing under privacy constraints is increasingly important for research and collaboration. Synthetic data generators (SDGs) are a promising solution, but synthetic data remains vulnerable to attacks, such as membership inference attacks (MIAs), which aim to determine whether a specific record was part of the training data. State-of-the-art MIAs are powerful but impractical: they rely on shadow modeling, requiring hundreds of SDG training runs, and need auxiliary data several times larger than the original training set. Fast proxy metrics like distance to closest record (DCR) are efficient but have limited sensitivity to MIA risk. We introduce ReMIA (Relative Membership Inference Attack), a practical privacy metric that requires only two SDG training runs and additional data no larger than the original training set. Rather than predicting whether a record was in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
