RAPID: Risk of Attribute Prediction-Induced Disclosure in Synthetic Microdata
Matthias Templ, Oscar Thees, Roman M\"uller

TL;DR
This paper introduces RAPID, a new risk measure for synthetic microdata that quantifies an adversary's ability to infer sensitive attributes, providing a practical and interpretable disclosure risk assessment method.
Contribution
RAPID offers a novel, robust, and interpretable metric for attribute inference risk in synthetic data, independent of specific generators and applicable with various learning algorithms.
Findings
RAPID effectively quantifies attribute inference risk in synthetic data.
The metric is robust to class imbalance and independent of the synthesis method.
Empirical results demonstrate RAPID's utility in evaluating synthetic data privacy.
Abstract
Statistical data anonymization increasingly relies on fully synthetic microdata, for which classical identity disclosure measures are less informative than an adversary's ability to infer sensitive attributes from released data. We introduce RAPID (Risk of Attribute Prediction--Induced Disclosure), a disclosure risk measure that directly quantifies inferential vulnerability under a realistic attack model. An adversary trains a predictive model solely on the released synthetic data and applies it to real individuals' quasi-identifiers. For continuous sensitive attributes, RAPID reports the proportion of records whose predicted values fall within a specified relative error tolerance. For categorical attributes, we propose a baseline-normalized confidence score that measures how much more confident the attacker is about the true class than would be expected from class prevalence alone, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Explainable Artificial Intelligence (XAI) · Imbalanced Data Classification Techniques
