CTTVAE: Latent Space Structuring for Conditional Tabular Data Generation on Imbalanced Datasets
Milosh Devic, Jordan Gierschendorf, David Garson

TL;DR
CTTVAE introduces a novel conditional variational autoencoder with class-aware latent structuring and adaptive sampling, significantly improving synthetic data quality for minority classes in imbalanced tabular datasets.
Contribution
It proposes CTTVAE with a triplet margin loss and targeted sampling, enhancing minority class representation without destabilizing training.
Findings
Outperforms existing models on minority class utility across six benchmarks.
Achieves better class representation and downstream task performance.
Maintains competitive data fidelity and privacy considerations.
Abstract
Generating synthetic tabular data under severe class imbalance is essential for domains where rare but high-impact events drive decision-making. However, most generative models either overlook minority groups or fail to produce samples that are useful for downstream learning. We introduce CTTVAE, a Conditional Transformer-based Tabular Variational Autoencoder equipped with two complementary mechanisms: (i) a class-aware triplet margin loss that restructures the latent space for sharper intra-class compactness and inter-class separation, and (ii) a training-by-sampling strategy that adaptively increases exposure to underrepresented groups. Together, these components form CTTVAE+TBS, a framework that consistently yields more representative and utility-aligned samples without destabilizing training. Across six real-world benchmarks, CTTVAE+TBS achieves the strongest downstream utility on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Generative Adversarial Networks and Image Synthesis · Machine Learning in Healthcare
