Machine Learning for Synthetic Data Generation: A Review
Yingzhou Lu, Lulu Chen, Yuanyuan Zhang, Minjie Shen, Huazheng Wang,, Xiao Wang, Capucine van Rechem, Tianfan Fu, Wenqi Wei

TL;DR
This review paper comprehensively examines how machine learning models generate synthetic data across various domains, highlighting methods, applications, privacy concerns, and future research directions.
Contribution
It provides a systematic overview of existing studies on machine learning-based synthetic data generation, emphasizing neural networks and addressing ethical considerations.
Findings
Neural networks and deep generative models are prominent in synthetic data creation.
Synthetic data helps overcome privacy and data access issues.
Challenges include ensuring data quality and fairness.
Abstract
Machine learning heavily relies on data, but real-world applications often encounter various data-related issues. These include data of poor quality, insufficient data points leading to under-fitting of machine learning models, and difficulties in data access due to concerns surrounding privacy, safety, and regulations. In light of these challenges, the concept of synthetic data generation emerges as a promising alternative that allows for data sharing and utilization in ways that real-world data cannot facilitate. This paper presents a comprehensive systematic review of existing studies that employ machine learning models for the purpose of generating synthetic data. The review encompasses various perspectives, starting with the applications of synthetic data generation, spanning computer vision, speech, natural language processing, healthcare, and business domains. Additionally, it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Traffic Prediction and Management Techniques · Human Mobility and Location-Based Analysis
