Tabular Data Generation Models: An In-Depth Survey and Performance Benchmarks with Extensive Tuning
G. Charbel N. Kindji (LACODAM), Lina Maria Rojas-Barahona, Elisa Fromont (LACODAM), Tanguy Urvoy

TL;DR
This paper provides a comprehensive evaluation of recent tabular data generation models, emphasizing the importance of dataset-specific tuning and comparing model performance, notably highlighting the effectiveness of diffusion-based models.
Contribution
It offers a unified benchmark with optimized hyperparameters and architectures across multiple datasets, revealing the impact of tuning and the relative performance of different models.
Findings
Dataset-specific tuning significantly improves model performance.
Diffusion-based models generally outperform others.
Reduced search space achieves comparable results at lower cost.
Abstract
The ability to train generative models that produce realistic, safe and useful tabular data is essential for data privacy, imputation, oversampling, explainability or simulation. However, generating tabular data is not straightforward due to its heterogeneity, non-smooth distributions, complex dependencies and imbalanced categorical features. Although diverse methods have been proposed in the literature, there is a need for a unified evaluation, under the same conditions, on a variety of datasets. This study addresses this need by fully considering the optimization of: hyperparameters, feature encodings, and architectures. We investigate the impact of dataset-specific tuning on five recent model families for tabular data generation through an extensive benchmark on 16 datasets. These datasets vary in terms of size (an average of 80,000 rows), data types, and domains. We also propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSimulation Techniques and Applications
