Modeling Tabular data using Conditional GAN
Lei Xu, Maria Skoularidou, Alfredo Cuesta-Infante, Kalyan, Veeramachaneni

TL;DR
This paper introduces TGAN, a conditional GAN model designed to generate realistic synthetic tabular data with mixed discrete and continuous features, outperforming Bayesian methods on real datasets.
Contribution
The paper proposes TGAN, a novel conditional GAN architecture tailored for tabular data, and establishes a benchmark for evaluating generative models on diverse datasets.
Findings
TGAN outperforms Bayesian network baselines on most real datasets.
Deep learning models other than TGAN struggled to model tabular data effectively.
The benchmark includes 7 simulated and 8 real datasets for comprehensive evaluation.
Abstract
Modeling the probability distribution of rows in tabular data and generating realistic synthetic data is a non-trivial task. Tabular data usually contains a mix of discrete and continuous columns. Continuous columns may have multiple modes whereas discrete columns are sometimes imbalanced making the modeling difficult. Existing statistical and deep neural network models fail to properly model this type of data. We design TGAN, which uses a conditional generative adversarial network to address these challenges. To aid in a fair and thorough comparison, we design a benchmark with 7 simulated and 8 real datasets and several Bayesian network baselines. TGAN outperforms Bayesian methods on most of the real datasets whereas other deep learning methods could not.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Gaussian Processes and Bayesian Inference · Bayesian Methods and Mixture Models
