Releasing survey microdata with exact cluster locations and additional privacy safeguards
Till Koebe, Alejandra Arias-Salazar

TL;DR
This paper proposes a novel microdata sharing approach that combines original survey data with synthetic data generated by models, significantly reducing re-identification risks while maintaining data utility.
Contribution
It introduces a new microdata dissemination method using generative models to enhance privacy safeguards without heavily compromising data utility.
Findings
Re-identification risk reduced by 60-80% with the proposed method.
Effective use of auxiliary satellite data to improve privacy protection.
Validated on Costa Rican census data with promising results.
Abstract
Household survey programs around the world publish fine-granular georeferenced microdata to support research on the interdependence of human livelihoods and their surrounding environment. To safeguard the respondents' privacy, micro-level survey data is usually (pseudo)-anonymized through deletion or perturbation procedures such as obfuscating the true location of data collection. This, however, poses a challenge to emerging approaches that augment survey data with auxiliary information on a local level. Here, we propose an alternative microdata dissemination strategy that leverages the utility of the original microdata with additional privacy safeguards through synthetically generated data using generative models. We back our proposal with experiments using data from the 2011 Costa Rican census and satellite-derived auxiliary information. Our strategy reduces the respondents'…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
