Improving the Utility of Poisson-Distributed, Differentially Private Synthetic Data via Prior Predictive Truncation with an Application to CDC WONDER
Harrison Quick

TL;DR
This paper enhances the utility of differentially private synthetic epidemiologic data by using prior information to truncate Poisson distributions, significantly reducing privacy costs and preserving key data features.
Contribution
It introduces a method that leverages public data to inform prior distributions, improving the utility of differentially private synthetic data for public health applications.
Findings
Reduced privacy budget requirements by several orders of magnitude.
Improved preservation of geographic and demographic disparities.
Demonstrated effectiveness on cancer death data from Pennsylvania.
Abstract
CDC WONDER is a web-based tool for the dissemination of epidemiologic data collected by the National Vital Statistics System. While CDC WONDER has built-in privacy protections, they do not satisfy formal privacy protections such as differential privacy and thus are susceptible to targeted attacks. Given the importance of making high-quality public health data publicly available while preserving the privacy of the underlying data subjects, we aim to improve the utility of a recently developed approach for generating Poisson-distributed, differentially private synthetic data by using publicly available information to truncate the range of the synthetic data. Specifically, we utilize county-level population information from the U.S. Census Bureau and national death reports produced by the CDC to inform prior distributions on county-level death rates and infer reasonable ranges for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
