Tabular Data Generation using Binary Diffusion

Vitaliy Kinakh; Slava Voloshynovskiy

arXiv:2409.13882·cs.LG·October 30, 2024

Tabular Data Generation using Binary Diffusion

Vitaliy Kinakh, Slava Voloshynovskiy

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces Binary Diffusion, a novel lossless binary transformation and generative model for tabular data that simplifies processing and outperforms existing models on benchmark datasets.

Contribution

The paper presents a new binary transformation method and a diffusion model tailored for binary data, reducing preprocessing complexity and improving performance.

Findings

01

Outperforms state-of-the-art models on benchmark datasets

02

Eliminates need for extensive preprocessing and large pretrained models

03

Achieves better results with a smaller model size

Abstract

Generating synthetic tabular data is critical in machine learning, especially when real data is limited or sensitive. Traditional generative models often face challenges due to the unique characteristics of tabular data, such as mixed data types and varied distributions, and require complex preprocessing or large pretrained models. In this paper, we introduce a novel, lossless binary transformation method that converts any tabular data into fixed-size binary representations, and a corresponding new generative model called Binary Diffusion, specifically designed for binary data. Binary Diffusion leverages the simplicity of XOR operations for noise addition and removal and employs binary cross-entropy loss for training. Our approach eliminates the need for extensive preprocessing, complex noise parameter tuning, and pretraining on large datasets. We evaluate our model on several popular…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vkinakh/binary-diffusion-tabular
pytorchOfficial

Models

🤗
vitaliykinakh/binary-ddpm-tabular
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques

MethodsEmirates Airlines Office in Dubai · Diffusion