An improved tabular data generator with VAE-GMM integration

Patricia A. Apell\'aniz; Juan Parras; Santiago Zazo

arXiv:2404.08434·cs.LG·November 15, 2024·1 cites

An improved tabular data generator with VAE-GMM integration

Patricia A. Apell\'aniz, Juan Parras, Santiago Zazo

PDF

Open Access

TL;DR

This paper introduces a novel VAE-GMM based model for generating synthetic tabular data that better captures complex data distributions, outperforming existing GAN-based methods like CTGAN and TVAE, especially in healthcare applications.

Contribution

The paper presents a VAE-GMM integrated model that effectively handles mixed data types and non-Gaussian distributions, improving synthetic data generation over prior models.

Findings

01

Outperforms CTGAN and TVAE on real-world datasets

02

Handles both continuous and discrete features effectively

03

Provides more accurate data distribution modeling

Abstract

The rising use of machine learning in various fields requires robust methods to create synthetic tabular data. Data should preserve key characteristics while addressing data scarcity challenges. Current approaches based on Generative Adversarial Networks, such as the state-of-the-art CTGAN model, struggle with the complex structures inherent in tabular data. These data often contain both continuous and discrete features with non-Gaussian distributions. Therefore, we propose a novel Variational Autoencoder (VAE)-based model that addresses these limitations. Inspired by the TVAE model, our approach incorporates a Bayesian Gaussian Mixture model (BGM) within the VAE architecture. This avoids the limitations imposed by assuming a strictly Gaussian latent space, allowing for a more accurate representation of the underlying data distribution during data generation. Furthermore, our model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRough Sets and Fuzzy Logic