Privacy risk from synthetic data: practical proposals
Gillian M Raab

TL;DR
This paper introduces practical measures for assessing privacy risks in synthetic data, helping data custodians decide on data release while identifying and excluding risky records.
Contribution
It proposes and evaluates new disclosure risk measures for synthetic data, with methods implemented in the synthpop R package.
Findings
Effective risk measures identified for synthetic data
Methods to detect and exclude risky records
Insights into disclosure risks from real data sets
Abstract
This paper proposes and compares measures of identity and attribute disclosure risk for synthetic data. Data custodians can use the methods proposed here to inform the decision as to whether to release synthetic versions of confidential data. Different measures are evaluated on two data sets. Insight into the measures is obtained by examining the details of the records identified as posing a disclosure risk. This leads to methods to identify, and possibly exclude, apparently risky records where the identification or attribution would be expected by someone with background knowledge of the data. The methods described are available as part of the \textbf{synthpop} package for \textbf{R}.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
