Tabular Data Generation using Binary Diffusion
Vitaliy Kinakh, Slava Voloshynovskiy

TL;DR
This paper introduces Binary Diffusion, a novel lossless binary transformation and generative model for tabular data that simplifies processing and outperforms existing models on benchmark datasets.
Contribution
The paper presents a new binary transformation method and a diffusion model tailored for binary data, reducing preprocessing complexity and improving performance.
Findings
Outperforms state-of-the-art models on benchmark datasets
Eliminates need for extensive preprocessing and large pretrained models
Achieves better results with a smaller model size
Abstract
Generating synthetic tabular data is critical in machine learning, especially when real data is limited or sensitive. Traditional generative models often face challenges due to the unique characteristics of tabular data, such as mixed data types and varied distributions, and require complex preprocessing or large pretrained models. In this paper, we introduce a novel, lossless binary transformation method that converts any tabular data into fixed-size binary representations, and a corresponding new generative model called Binary Diffusion, specifically designed for binary data. Binary Diffusion leverages the simplicity of XOR operations for noise addition and removal and employs binary cross-entropy loss for training. Our approach eliminates the need for extensive preprocessing, complex noise parameter tuning, and pretraining on large datasets. We evaluate our model on several popular…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques
MethodsEmirates Airlines Office in Dubai · Diffusion
