Hybrid Autoencoders for Tabular Data: Leveraging Model-Based Augmentation in Low-Label Settings
Erel Naor, Ofir Lindenbaum

TL;DR
This paper introduces a hybrid autoencoder combining neural and decision tree encoders with model-based augmentation to improve learning on tabular data, especially with limited labels.
Contribution
It presents a novel hybrid autoencoder architecture with sample-specific gating and model-based augmentation, enhancing representation learning for tabular data under low-label regimes.
Findings
Outperforms deep and tree-based baselines on tabular datasets.
Improves low-label classification and regression accuracy.
Leverages complementary inductive biases of neural and decision tree encoders.
Abstract
Deep neural networks often under-perform on tabular data due to their sensitivity to irrelevant features and a spectral bias toward smooth, low-frequency functions. These limitations hinder their ability to capture the sharp, high-frequency signals that often define tabular structure, especially under limited labeled samples. While self-supervised learning (SSL) offers promise in such settings, it remains challenging in tabular domains due to the lack of effective data augmentations. We propose a hybrid autoencoder that combines a neural encoder with an oblivious soft decision tree (OSDT) encoder, each guided by its own stochastic gating network that performs sample-specific feature selection. Together, these structurally different encoders and model-specific gating networks implement model-based augmentation, producing complementary input views tailored to each architecture. The two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis
