To democratize research with sensitive data, we should make synthetic data more accessible
Erik-Jan van Kesteren

TL;DR
This paper advocates for making synthetic data more accessible through tools, education, and case studies to promote open and reproducible research with sensitive data, rather than solely improving synthesis methods.
Contribution
It emphasizes shifting focus from developing synthesis techniques to enhancing accessibility, education, and practical case studies for wider adoption of synthetic data.
Findings
Synthetic data has potential but limited adoption.
Accessibility and education are key to adoption.
Small-scale case studies can demonstrate utility.
Abstract
For over 30 years, synthetic data has been heralded as a promising solution to make sensitive datasets accessible. However, despite much research effort and several high-profile use-cases, the widespread adoption of synthetic data as a tool for open, accessible, reproducible research with sensitive data is still a distant dream. In this opinion, Erik-Jan van Kesteren, head of the ODISSEI Social Data Science team, argues that in order to progress towards widespread adoption of synthetic data as a privacy enhancing technology, the data science research community should shift focus away from developing better synthesis methods: instead, it should develop accessible tools, educate peers, and publish small-scale case studies.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management
