RAPID: Risk of Attribute Prediction-Induced Disclosure in Synthetic Microdata

Matthias Templ; Oscar Thees; Roman M\"uller

arXiv:2602.09235·cs.LG·February 11, 2026

RAPID: Risk of Attribute Prediction-Induced Disclosure in Synthetic Microdata

Matthias Templ, Oscar Thees, Roman M\"uller

PDF

Open Access

TL;DR

This paper introduces RAPID, a new risk measure for synthetic microdata that quantifies an adversary's ability to infer sensitive attributes, providing a practical and interpretable disclosure risk assessment method.

Contribution

RAPID offers a novel, robust, and interpretable metric for attribute inference risk in synthetic data, independent of specific generators and applicable with various learning algorithms.

Findings

01

RAPID effectively quantifies attribute inference risk in synthetic data.

02

The metric is robust to class imbalance and independent of the synthesis method.

03

Empirical results demonstrate RAPID's utility in evaluating synthetic data privacy.

Abstract

Statistical data anonymization increasingly relies on fully synthetic microdata, for which classical identity disclosure measures are less informative than an adversary's ability to infer sensitive attributes from released data. We introduce RAPID (Risk of Attribute Prediction--Induced Disclosure), a disclosure risk measure that directly quantifies inferential vulnerability under a realistic attack model. An adversary trains a predictive model solely on the released synthetic data and applies it to real individuals' quasi-identifiers. For continuous sensitive attributes, RAPID reports the proportion of records whose predicted values fall within a specified relative error tolerance. For categorical attributes, we propose a baseline-normalized confidence score that measures how much more confident the attacker is about the true class than would be expected from class prevalence alone, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Explainable Artificial Intelligence (XAI) · Imbalanced Data Classification Techniques