AutoDiff: combining Auto-encoder and Diffusion model for tabular data synthesizing
Namjoon Suh, Xiaofeng Lin, Din-Yin Hsieh, Merhdad Honarkhah, Guang, Cheng

TL;DR
This paper introduces AutoDiff, a novel approach combining auto-encoders and diffusion models to generate high-quality synthetic tabular data that preserves feature correlations and improves downstream task performance.
Contribution
The paper presents a new diffusion-based method with auto-encoder architecture specifically designed for tabular data synthesis, addressing feature heterogeneity and correlation challenges.
Findings
Synthetic data shows high statistical fidelity to real data.
Model captures complex feature correlations effectively.
Outperforms existing tabular data synthesizers in utility tasks.
Abstract
Diffusion model has become a main paradigm for synthetic data generation in many subfields of modern machine learning, including computer vision, language model, or speech synthesis. In this paper, we leverage the power of diffusion model for generating synthetic tabular data. The heterogeneous features in tabular data have been main obstacles in tabular data synthesis, and we tackle this problem by employing the auto-encoder architecture. When compared with the state-of-the-art tabular synthesizers, the resulting synthetic tables from our model show nice statistical fidelities to the real data, and perform well in downstream tasks for machine learning utilities. We conducted the experiments over publicly available datasets. Notably, our model adeptly captures the correlations among features, which has been a long-standing challenge in tabular data synthesis. Our code is available…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques
MethodsDiffusion
