Advancing Microdata Privacy Protection: A Review of Synthetic Data
Jingchen Hu, Claire McKay Bowen

TL;DR
This paper reviews the development, techniques, challenges, and applications of synthetic data generation for privacy protection, emphasizing its importance and future prospects in safeguarding record-level data.
Contribution
It offers a comprehensive overview of synthetic data generation methods, evaluation techniques, and discusses current challenges and future research directions in privacy protection.
Findings
Synthetic data effectively protects privacy in data sharing.
Current methods face challenges in data utility and realism.
Future research needed for improved synthetic data techniques.
Abstract
Synthetic data generation is a powerful tool for privacy protection when considering public release of record-level data files. Initially proposed about three decades ago, it has generated significant research and application interest. To meet the pressing demand of data privacy protection in a variety of contexts, the field needs more researchers and practitioners. This review provides a comprehensive introduction to synthetic data, including technical details of their generation and evaluation. Our review also addresses the challenges and limitations of synthetic data, discusses practical applications, and provides thoughts for future work.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Vehicular Ad Hoc Networks (VANETs) · Traffic Prediction and Management Techniques
