Diffusion models for missing value imputation in tabular data
Shuhan Zheng, Nontawat Charoenphakdee

TL;DR
This paper introduces TabCSDI, a diffusion model tailored for missing value imputation in tabular data, demonstrating its effectiveness over existing methods and highlighting the importance of categorical embedding techniques.
Contribution
The paper proposes a novel diffusion model approach, TabCSDI, for tabular data imputation, incorporating techniques for handling categorical and numerical variables.
Findings
TabCSDI outperforms existing imputation methods on benchmark datasets.
Categorical embedding techniques significantly impact imputation performance.
Effective handling of mixed variable types improves imputation accuracy.
Abstract
Missing value imputation in machine learning is the task of estimating the missing values in the dataset accurately using available information. In this task, several deep generative modeling methods have been proposed and demonstrated their usefulness, e.g., generative adversarial imputation networks. Recently, diffusion models have gained popularity because of their effectiveness in the generative modeling task in images, texts, audio, etc. To our knowledge, less attention has been paid to the investigation of the effectiveness of diffusion models for missing value imputation in tabular data. Based on recent development of diffusion models for time-series data imputation, we propose a diffusion model approach called "Conditional Score-based Diffusion Models for Tabular data" (TabCSDI). To effectively handle categorical variables and numerical variables simultaneously, we investigate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Bayesian Methods and Mixture Models · Gaussian Processes and Bayesian Inference
MethodsDiffusion
