The DCR Delusion: Measuring the Privacy Risk of Synthetic Data
Zexi Yao, Nata\v{s}a Kr\v{c}o, Georgi Ganev, Yves-Alexandre de, Montjoye

TL;DR
This paper critically evaluates the effectiveness of distance-based metrics like DCR for assessing privacy risks in synthetic data, demonstrating their failures and advocating for membership inference attacks as a more reliable standard.
Contribution
The study reveals that DCR and similar proxy metrics are unreliable for privacy assessment, and emphasizes the need to adopt MIAs for accurate privacy evaluation of synthetic datasets.
Findings
Distance-based metrics often fail to detect privacy leakage.
Datasets deemed private by proxy metrics are vulnerable to MIAs.
Proxy metrics are flawed and miss actual privacy risks.
Abstract
Synthetic data has become an increasingly popular way to share data without revealing sensitive information. Though Membership Inference Attacks (MIAs) are widely considered the gold standard for empirically assessing the privacy of a synthetic dataset, practitioners and researchers often rely on simpler proxy metrics such as Distance to Closest Record (DCR). These metrics estimate privacy by measuring the similarity between the training data and generated synthetic data. This similarity is also compared against that between the training data and a disjoint holdout set of real records to construct a binary privacy test. If the synthetic data is not more similar to the training data than the holdout set is, it passes the test and is considered private. In this work we show that, while computationally inexpensive, DCR and other distance-based metrics fail to identify privacy leakage.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Digital and Cyber Forensics
MethodsDiffusion · Sparse Evolutionary Training
