Protecting Privacy and Transforming COVID-19 Case Surveillance Datasets for Public Use
Brian Lee, Brandi Dupervil, Nicholas P. Deputy, Wil Duck, Stephen, Soroka, Lyndsay Bottichio, Benjamin Silk, Jason Price, Patricia Sweeney,, Jennifer Fuld, Todd Weber, Dan Pollock

TL;DR
This paper details CDC's approach to creating de-identified COVID-19 datasets that balance data utility with privacy, enabling broader public access and research while safeguarding individual confidentiality.
Contribution
It introduces a systematic method for producing privacy-protected, de-identified COVID-19 datasets from large-scale health data, facilitating public sharing and research.
Findings
Public datasets are available via Data.CDC.gov
Restricted datasets are accessible through a data use agreement on GitHub
Automated procedures enable timely data sharing
Abstract
Objectives: Federal open data initiatives that promote increased sharing of federally collected data are important for transparency, data quality, trust, and relationships with the public and state, tribal, local, and territorial (STLT) partners. These initiatives advance understanding of health conditions and diseases by providing data to more researchers, scientists, and policymakers for analysis, collaboration, and valuable use outside CDC responders. This is particularly true for emerging conditions such as COVID-19 where we have much to learn and have evolving data needs. Since the beginning of the outbreak, CDC has collected person-level, de-identified data from jurisdictions and currently has over 8 million records, increasing each day. This paper describes how CDC designed and produces two de-identified public datasets from these collected data. Materials and Methods: Data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
