Bayesian Marked Point Process Modeling for Generating Fully Synthetic Public Use Data with Point-Referenced Geography
Harrison Quick, Scott H. Holan, Christopher K. Wikle, and Jerome P., Reiter

TL;DR
This paper introduces a Bayesian marked point process model to generate fully synthetic spatial data, preserving statistical and spatial properties while protecting confidentiality, offering an alternative to traditional data redaction methods.
Contribution
The paper presents a novel Bayesian marked point process approach for creating fully synthetic spatial data that maintains data utility and confidentiality.
Findings
Successfully modeled mortality data from Durham, North Carolina.
Generated synthetic data preserving spatial dependence.
Outperformed traditional coarsening and perturbation methods.
Abstract
Many data stewards collect confidential data that include fine geography. When sharing these data with others, data stewards strive to disseminate data that are informative for a wide range of spatial and non-spatial analyses while simultaneously protecting the confidentiality of data subjects' identities and attributes. Typically, data stewards meet this challenge by coarsening the resolution of the released geography and, as needed, perturbing the confidential attributes. When done with high intensity, these redaction strategies can result in released data with poor analytic quality. We propose an alternative dissemination approach based on fully synthetic data. We generate data using marked point process models that can maintain both the statistical properties and the spatial dependence structure of the confidential data. We illustrate the approach using data consisting of mortality…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
