SynQP: A Framework and Metrics for Evaluating the Quality and Privacy Risk of Synthetic Data
Bing Hu, Yixin Li, Asma Bahamyirou, Helen Chen

TL;DR
SynQP is an open framework designed to benchmark privacy risks in synthetic data generation, especially for health applications, using simulated data and new metrics to improve evaluation transparency and safety.
Contribution
The paper introduces SynQP, a novel open framework and metrics for privacy evaluation of synthetic data, addressing the lack of benchmark datasets and fair privacy assessment methods.
Findings
SynQP enables effective benchmarking of privacy risks in synthetic data.
The proposed identity disclosure risk metric provides more accurate privacy risk estimation.
Differential privacy consistently reduces privacy risks below regulatory thresholds.
Abstract
The use of synthetic data in health applications raises privacy concerns, yet the lack of open frameworks for privacy evaluations has slowed its adoption. A major challenge is the absence of accessible benchmark datasets for evaluating privacy risks, due to difficulties in acquiring sensitive data. To address this, we introduce SynQP, an open framework for benchmarking privacy in synthetic data generation (SDG) using simulated sensitive data, ensuring that original data remains confidential. We also highlight the need for privacy metrics that fairly account for the probabilistic nature of machine learning models. As a demonstration, we use SynQP to benchmark CTGAN and propose a new identity disclosure risk metric that offers a more accurate estimation of privacy risks compared to existing approaches. Our work provides a critical tool for improving the transparency and reliability of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Ethics and Social Impacts of AI
