Self-Supervision Improves Diffusion Models for Tabular Data Imputation

Yixin Liu; Thalaiyasingam Ajanthan; Hisham Husain; Vu Nguyen

arXiv:2407.18013·cs.LG·July 26, 2024

Self-Supervision Improves Diffusion Models for Tabular Data Imputation

Yixin Liu, Thalaiyasingam Ajanthan, Hisham Husain, Vu Nguyen

PDF

1 Repo

TL;DR

This paper presents SimpDM, a self-supervised diffusion model designed for tabular data imputation, which improves stability and robustness over existing methods through alignment and data augmentation strategies.

Contribution

The paper introduces SimpDM, a novel self-supervised diffusion model with alignment and data augmentation techniques tailored for robust tabular data imputation.

Findings

01

SimpDM outperforms state-of-the-art imputation methods in various scenarios.

02

Self-supervised alignment improves stability of diffusion-based imputation.

03

State-dependent data augmentation enhances robustness with limited data.

Abstract

The ubiquity of missing data has sparked considerable attention and focus on tabular data imputation methods. Diffusion models, recognized as the cutting-edge technique for data generation, demonstrate significant potential in tabular data imputation tasks. However, in pursuit of diversity, vanilla diffusion models often exhibit sensitivity to initialized noises, which hinders the models from generating stable and accurate imputation results. Additionally, the sparsity inherent in tabular data poses challenges for diffusion models in accurately modeling the data manifold, impacting the robustness of these models for data imputation. To tackle these challenges, this paper introduces an advanced diffusion model named Self-supervised imputation Diffusion Model (SimpDM for brevity), specifically tailored for tabular data imputation tasks. To mitigate sensitivity to noise, we introduce a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yixinliu233/simpdm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need · Focus · Diffusion