The Inadequacy of Similarity-based Privacy Metrics: Privacy Attacks against "Truly Anonymous" Synthetic Datasets
Georgi Ganev, Emiliano De Cristofaro

TL;DR
This paper critically evaluates the reliability of similarity-based privacy metrics for synthetic data, demonstrating their failure to prevent privacy breaches and introducing a reconstruction attack that exposes individual data points despite passing these metrics.
Contribution
It reveals the inadequacy of common similarity-based privacy metrics and proposes ReconSyn, a reconstruction attack exposing privacy leaks in synthetic datasets.
Findings
Severe privacy violations occur even when metrics pass
ReconSyn recovers 78-100% of outliers with minimal access
Applying DP to models does not prevent ReconSyn attacks
Abstract
Generative models producing synthetic data are meant to provide a privacy-friendly approach to releasing data. However, their privacy guarantees are only considered robust when models satisfy Differential Privacy (DP). Alas, this is not a ubiquitous standard, as many leading companies (and, in fact, research papers) use ad-hoc privacy metrics based on testing the statistical similarity between synthetic and real data. In this paper, we examine the privacy metrics used in real-world synthetic data deployments and demonstrate their unreliability in several ways. First, we provide counter-examples where severe privacy violations occur even if the privacy tests pass and instantiate accurate membership and attribute inference attacks with minimal cost. We then introduce ReconSyn, a reconstruction attack that generates multiple synthetic datasets that are considered private by the metrics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI
