TabEBM: A Tabular Data Augmentation Method with Distinct Class-Specific Energy-Based Models
Andrei Margeloiu, Xiangjian Jiang, Nikola Simidjievski, Mateja Jamnik

TL;DR
TabEBM introduces class-specific energy-based models for tabular data augmentation, significantly improving synthetic data quality and classification performance in small datasets by modeling each class separately.
Contribution
The paper presents a novel approach using distinct EBMs for each class, enhancing the quality of synthetic data and classification results over existing methods.
Findings
Synthetic data from TabEBM outperforms existing methods in quality.
Augmentation with TabEBM improves classification accuracy on small datasets.
Distinct class-specific EBMs create more robust energy landscapes.
Abstract
Data collection is often difficult in critical fields such as medicine, physics, and chemistry. As a result, classification methods usually perform poorly with these small datasets, leading to weak predictive performance. Increasing the training set with additional synthetic data, similar to data augmentation in images, is commonly believed to improve downstream classification performance. However, current tabular generative methods that learn either the joint distribution or the class-conditional distribution often overfit on small datasets, resulting in poor-quality synthetic data, usually worsening classification performance compared to using real data alone. To solve these challenges, we introduce TabEBM, a novel class-conditional generative method using Energy-Based Models (EBMs). Unlike existing methods that use a shared model to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications
MethodsSparse Evolutionary Training · energy-based model
