Identification Risks Evaluation of Partially Synthetic Data with the $\texttt{IdentificationRiskCalculation}$ R Package
Ryan Hornby, Jingchen Hu

TL;DR
This paper introduces a new R package, IdentificationRiskCalculation, for evaluating the identification risk of partially synthetic data, focusing on continuous variables and considering factors like radius, variable choice, and dataset number.
Contribution
The paper extends existing methods by incorporating a radius parameter for risk calculation and provides an R package to facilitate risk assessment of synthesized data.
Findings
The R package effectively computes identification risks for continuous variables.
Risk and data utility are influenced by the choice of radius, variables, and dataset number.
Recommendations are provided for statistical agencies on synthesizing and evaluating risks.
Abstract
We extend a general approach to evaluating identification risk of synthesized variables in partially synthetic data. For multiple continuous synthesized variables, we introduce the use of a radius in the construction of identification risk probability of each target record, and illustrate with working examples. We create the R package to aid researchers and data disseminators in performing these identification risks evaluation calculations. We demonstrate our methods through the R package with applications to a data sample from the Consumer Expenditure Surveys, and discuss the impacts on risk and data utility of 1) the choice of radius , 2) the choice of synthesized variables, and 3) the choice of number of synthetic datasets. We give recommendations for statistical agencies for synthesizing and evaluating identification risk of continuous…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Statistical Methods and Inference · Data Analysis with R
