CausalDiffTab: Mixed-Type Causal-Aware Diffusion for Tabular Data Generation
Jia-Chen Zhang, Zheng Zhou, Yu-Jie Xiong, Chun-Ming Xia, Fei Dai

TL;DR
CausalDiffTab is a diffusion-based generative model tailored for high-quality mixed-type tabular data, effectively capturing complex variable interactions while preserving data privacy.
Contribution
It introduces a novel diffusion model with causal regularization for generating complex mixed-type tabular data, addressing heterogeneity and inter-variable relationships.
Findings
Outperforms baseline methods across all metrics on seven datasets.
Effectively captures complex interactions among variables.
Enhances data quality for privacy-preserving data synthesis.
Abstract
Training data has been proven to be one of the most critical components in training generative AI. However, obtaining high-quality data remains challenging, with data privacy issues presenting a significant hurdle. To address the need for high-quality data. Synthesize data has emerged as a mainstream solution, demonstrating impressive performance in areas such as images, audio, and video. Generating mixed-type data, especially high-quality tabular data, still faces significant challenges. These primarily include its inherent heterogeneous data types, complex inter-variable relationships, and intricate column-wise distributions. In this paper, we introduce CausalDiffTab, a diffusion model-based generative model specifically designed to handle mixed tabular data containing both numerical and categorical features, while being more flexible in capturing complex interactions among variables.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Data Quality and Management · Scientific Computing and Data Management
MethodsDiffusion
