High-Quality Tabular Data Generation using Post-Selected VAE

Volodymyr Shulakov

arXiv:2407.13016·cs.LG·July 19, 2024·1 cites

High-Quality Tabular Data Generation using Post-Selected VAE

Volodymyr Shulakov

PDF

Open Access

TL;DR

This paper presents PSVAE, a simple yet effective variational autoencoder-based method for generating high-quality synthetic tabular data efficiently, addressing limitations of previous models in handling complex datasets.

Contribution

Introduction of PSVAE, a novel model combining loss optimization and post-selection to improve synthetic tabular data quality and runtime performance.

Findings

01

Produces high-quality synthetic data efficiently

02

Handles complex datasets better than previous models

03

Compensates for underrepresented categories

Abstract

Synthetic tabular data is becoming a necessity as concerns about data privacy intensify in the world. Tabular data can be useful for testing various systems, simulating real data, analyzing the data itself or building predictive models. Unfortunately, such data may not be available due to confidentiality issues. Previous techniques, such as TVAE (Xu et al., 2019) or OCTGAN (Kim et al., 2021), are either unable to handle particularly complex datasets, or are complex in themselves, resulting in inferior run time performance. This paper introduces PSVAE, a new simple model that is capable of producing high-quality synthetic data in less run time. PSVAE incorporates two key ideas: loss optimization and post-selection. Along with these ideas, the proposed model compensates for underrepresented categories and uses a modern activation function, Mish (Misra, 2019).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Advanced Database Systems and Queries · Time Series Analysis and Forecasting

MethodsTanh Activation