The Inadequacy of Similarity-based Privacy Metrics: Privacy Attacks   against "Truly Anonymous" Synthetic Datasets

Georgi Ganev; Emiliano De Cristofaro

arXiv:2312.05114·cs.CR·May 9, 2025·5 cites

The Inadequacy of Similarity-based Privacy Metrics: Privacy Attacks against "Truly Anonymous" Synthetic Datasets

Georgi Ganev, Emiliano De Cristofaro

PDF

Open Access

TL;DR

This paper critically evaluates the reliability of similarity-based privacy metrics for synthetic data, demonstrating their failure to prevent privacy breaches and introducing a reconstruction attack that exposes individual data points despite passing these metrics.

Contribution

It reveals the inadequacy of common similarity-based privacy metrics and proposes ReconSyn, a reconstruction attack exposing privacy leaks in synthetic datasets.

Findings

01

Severe privacy violations occur even when metrics pass

02

ReconSyn recovers 78-100% of outliers with minimal access

03

Applying DP to models does not prevent ReconSyn attacks

Abstract

Generative models producing synthetic data are meant to provide a privacy-friendly approach to releasing data. However, their privacy guarantees are only considered robust when models satisfy Differential Privacy (DP). Alas, this is not a ubiquitous standard, as many leading companies (and, in fact, research papers) use ad-hoc privacy metrics based on testing the statistical similarity between synthetic and real data. In this paper, we examine the privacy metrics used in real-world synthetic data deployments and demonstrate their unreliability in several ways. First, we provide counter-examples where severe privacy violations occur even if the privacy tests pass and instantiate accurate membership and attribute inference attacks with minimal cost. We then introduce ReconSyn, a reconstruction attack that generates multiple synthetic datasets that are considered private by the metrics…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI