A Linear Reconstruction Approach for Attribute Inference Attacks against Synthetic Data
Meenatchi Sundaram Muthu Selva Annamalai, Andrea Gadotti, Luc, Rocher

TL;DR
This paper introduces a linear reconstruction-based attribute inference attack on synthetic data that can target all records, revealing privacy vulnerabilities in state-of-the-art synthetic data generation methods.
Contribution
It presents a novel attack method applicable to various SDG algorithms, demonstrating significant privacy risks and analyzing the utility-privacy tradeoff.
Findings
Current SDG methods are vulnerable to inference attacks.
Differentially private SDG offers some protection but with utility tradeoffs.
Releasing more synthetic records increases utility but also attack effectiveness.
Abstract
Recent advances in synthetic data generation (SDG) have been hailed as a solution to the difficult problem of sharing sensitive data while protecting privacy. SDG aims to learn statistical properties of real data in order to generate "artificial" data that are structurally and statistically similar to sensitive data. However, prior research suggests that inference attacks on synthetic data can undermine privacy, but only for specific outlier records. In this work, we introduce a new attribute inference attack against synthetic data. The attack is based on linear reconstruction methods for aggregate statistics, which target all records in the dataset, not only outliers. We evaluate our attack on state-of-the-art SDG algorithms, including Probabilistic Graphical Models, Generative Adversarial Networks, and recent differentially private SDG mechanisms. By defining a formal privacy game, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Data-Driven Disease Surveillance
