Multiple imputation for sharing precise geographies in public use data
Hao Wang, Jerome P. Reiter

TL;DR
This paper introduces a multiple imputation approach to protect geographic confidentiality in public data releases, enabling sharing of detailed location data while mitigating disclosure risks.
Contribution
It proposes a novel method using regression trees for generating simulated geographies and attributes, improving data privacy without sacrificing analytical utility.
Findings
Effective in protecting geographic confidentiality.
Provides tools for generating and assessing simulated geographies.
Demonstrated on causes of death data in Durham, NC.
Abstract
When releasing data to the public, data stewards are ethically and often legally obligated to protect the confidentiality of data subjects' identities and sensitive attributes. They also strive to release data that are informative for a wide range of secondary analyses. Achieving both objectives is particularly challenging when data stewards seek to release highly resolved geographical information. We present an approach for protecting the confidentiality of data with geographic identifiers based on multiple imputation. The basic idea is to convert geography to latitude and longitude, estimate a bivariate response model conditional on attributes, and simulate new latitude and longitude values from these models. We illustrate the proposed methods using data describing causes of death in Durham, North Carolina. In the context of the application, we present a straightforward tool for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
