Synthetic Data for Social Good
Bill Howe (University of Washington), Julia Stoyanovich (Drexel, University), Haoyue Ping (Drexel University), Bernease Herman (University of, Washington), Matt Gee (Impact Lab)

TL;DR
This paper introduces DataSynthesizer, a tool for generating privacy-preserving synthetic data that is representative of original datasets, enabling safe and effective collaboration without risking privacy violations.
Contribution
It presents a user-friendly synthetic data generation tool that requires minimal parameter tuning, facilitating responsible data sharing for social good.
Findings
Generates synthetic data with strong privacy guarantees.
Maintains structural and statistical similarity to original data.
Enhances collaboration without compromising privacy.
Abstract
Data for good implies unfettered access to data. But data owners must be conservative about how, when, and why they share data or risk violating the trust of the people they aim to help, losing their funding, or breaking the law. Data sharing agreements can help prevent privacy violations, but require a level of specificity that is premature during preliminary discussions, and can take over a year to establish. We consider the generation and use of synthetic data to facilitate ad hoc collaborations involving sensitive data. A good synthetic dataset has two properties: it is representative of the original data, and it provides strong guarantees about privacy. In this paper, we discuss important use cases for synthetic data that challenge the state of the art in privacy-preserving data generation, and describe DataSynthesizer, a dataset generation tool that takes a sensitive dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Ethics and Social Impacts of AI · Mobile Crowdsensing and Crowdsourcing
