Synthetic Data -- what, why and how?
James Jordon, Lukasz Szpruch, Florimond Houssiau, Mirko Bottarelli,, Giovanni Cherubin, Carsten Maple, Samuel N. Cohen, Adrian Weller

TL;DR
This paper provides an accessible overview of synthetic data technologies, emphasizing their importance for privacy, and discusses key concepts, benefits, and nuances for non-technical audiences.
Contribution
It offers a comprehensive, non-technical introduction to synthetic data, clarifying its concepts, applications, and privacy considerations, highlighting its potential and complexities.
Findings
Synthetic data enhances privacy preservation.
Synthetic data is useful across various domains.
Nuances in synthetic data deployment are critical.
Abstract
This explainer document aims to provide an overview of the current state of the rapidly expanding work on synthetic data technologies, with a particular focus on privacy. The article is intended for a non-technical audience, though some formal definitions have been given to provide clarity to specialists. This article is intended to enable the reader to quickly become familiar with the notion of synthetic data, as well as understand some of the subtle intricacies that come with it. We do believe that synthetic data is a very useful tool, and our hope is that this report highlights that, while drawing attention to nuances that can easily be overlooked in its deployment.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Big Data Technologies and Applications
