A Conditional GAN for Tabular Data Generation with Probabilistic Sampling of Latent Subspaces
Leonidas Akritidis, Panayiotis Bozanis

TL;DR
This paper introduces ctdGAN, a novel conditional GAN that improves tabular data generation by considering data subspaces and class labels, effectively addressing class imbalance and enhancing data fidelity.
Contribution
The study proposes a new probabilistic sampling strategy and loss function for conditional GANs that generate data in original subspaces, improving over existing methods.
Findings
Outperforms existing models in generating high-quality synthetic data
Enhances classification accuracy on imbalanced datasets
Effectively captures multiple feature modes without increasing data dimensionality
Abstract
The tabular form constitutes the standard way of representing data in relational database systems and spreadsheets. But, similarly to other forms, tabular data suffers from class imbalance, a problem that causes serious performance degradation in a wide variety of machine learning tasks. One of the most effective solutions dictates the usage of Generative Adversarial Networks (GANs) in order to synthesize artificial data instances for the under-represented classes. Despite their good performance, none of the proposed GAN models takes into account the vector subspaces of the input samples in the real data space, leading to data generation in arbitrary locations. Moreover, the class labels are treated in the same manner as the other categorical variables during training, so conditional sampling by class is rendered less effective. To overcome these problems, this study presents ctdGAN, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Machine Learning and Data Classification · Data Quality and Management
