Synthetic Tabular Data Generation: A Comparative Survey for Modern Techniques
Raju Challagundla, Mohsen Dorodchi, Pu Wang, Minwoo Lee

TL;DR
This survey reviews recent methods for generating synthetic tabular data, focusing on preserving data utility, privacy, and complex feature relationships, and introduces a taxonomy and benchmark framework to guide future research and practical applications.
Contribution
It introduces a novel taxonomy based on generation objectives and proposes a benchmark framework, bridging theoretical methods with real-world privacy and utility needs.
Findings
Highlights the importance of preserving feature relationships
Emphasizes privacy guarantees in synthetic data generation
Provides a benchmark for evaluating synthetic tabular data methods
Abstract
As privacy regulations become more stringent and access to real-world data becomes increasingly constrained, synthetic data generation has emerged as a vital solution, especially for tabular datasets, which are central to domains like finance, healthcare and the social sciences. This survey presents a comprehensive and focused review of recent advances in synthetic tabular data generation, emphasizing methods that preserve complex feature relationships, maintain statistical fidelity, and satisfy privacy requirements. A key contribution of this work is the introduction of a novel taxonomy based on practical generation objectives, including intended downstream applications, privacy guarantees, and data utility, directly informing methodological design and evaluation strategies. Therefore, this review prioritizes the actionable goals that drive synthetic data creation, including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms
