TL;DR
This paper surveys three paradigms for tabular data generation—GANs, diffusion models, and LLMs—and introduces a unified framework with a reference implementation.
Contribution
It presents a modular, unified framework supporting multiple generative paradigms for tabular data and validates its effectiveness through experiments on benchmark datasets.
Findings
GAN-based augmentation improves downstream performance under distribution shift
The framework supports data preprocessing, training, inference, and evaluation for all three paradigms
The reference implementation is publicly available for reproducibility
Abstract
Generative models for tabular data have evolved rapidly beyond Generative Adversarial Networks (GANs). While GANs pioneered synthetic tabular data generation, recent advances in diffusion models and large language models (LLMs) have opened new paradigms with complementary strengths in sample quality, privacy, and controllability. In this paper, we survey the landscape of tabular data generation across three major paradigms - GANs, diffusion models, and LLMs - and introduce a unified, modular framework that supports all three. The framework encompasses data preprocessing, a model-agnostic interface layer, standardized training and inference pipelines, and a comprehensive evaluation module. We validate the framework through experiments on seven benchmark datasets, demonstrating that GAN-based augmentation can improve downstream performance under distribution shift. The framework and its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
