TL;DR
This paper introduces a risk-aware Bayesian data synthesis method that adjusts for privacy risks using weighted pseudo likelihoods, improving data privacy and utility in statistical releases.
Contribution
It develops a novel risk-adjusted pseudo likelihood approach for Bayesian data synthesis that mitigates re-identification risks while maintaining data utility.
Findings
Risk-adjusted synthesizer improves privacy protection overall.
Pairwise risk-based weighting reduces re-identification risk more effectively.
Method enhances data utility while controlling privacy risks.
Abstract
Statistical agencies utilize models to synthesize respondent-level data for release to the public for privacy protection. In this work, we efficiently induce privacy protection into any Bayesian synthesis model by employing a pseudo likelihood that exponentiates each likelihood contribution by an observation record-indexed weight in [0, 1], defined to be inversely proportional to the identification risk for that record. We start with the marginal probability of identification risk for a record, which is composed as the probability that the identity of the record may be disclosed. Our application to the Consumer Expenditure Surveys (CE) of the U.S. Bureau of Labor Statistics demonstrates that the marginally risk-adjusted synthesizer provides an overall improved privacy protection; however, the identification risks actually increase for some moderate-risk records after risk-adjusted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
