Hybrid Autoencoders for Tabular Data: Leveraging Model-Based Augmentation in Low-Label Settings

Erel Naor; Ofir Lindenbaum

arXiv:2511.06961·cs.LG·November 11, 2025

Hybrid Autoencoders for Tabular Data: Leveraging Model-Based Augmentation in Low-Label Settings

Erel Naor, Ofir Lindenbaum

PDF

Open Access 1 Video

TL;DR

This paper introduces a hybrid autoencoder combining neural and decision tree encoders with model-based augmentation to improve learning on tabular data, especially with limited labels.

Contribution

It presents a novel hybrid autoencoder architecture with sample-specific gating and model-based augmentation, enhancing representation learning for tabular data under low-label regimes.

Findings

01

Outperforms deep and tree-based baselines on tabular datasets.

02

Improves low-label classification and regression accuracy.

03

Leverages complementary inductive biases of neural and decision tree encoders.

Abstract

Deep neural networks often under-perform on tabular data due to their sensitivity to irrelevant features and a spectral bias toward smooth, low-frequency functions. These limitations hinder their ability to capture the sharp, high-frequency signals that often define tabular structure, especially under limited labeled samples. While self-supervised learning (SSL) offers promise in such settings, it remains challenging in tabular domains due to the lack of effective data augmentations. We propose a hybrid autoencoder that combines a neural encoder with an oblivious soft decision tree (OSDT) encoder, each guided by its own stochastic gating network that performs sample-specific feature selection. Together, these structurally different encoders and model-specific gating networks implement model-based augmentation, producing complementary input views tailored to each architecture. The two…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Hybrid Autoencoders for Tabular Data: Leveraging Model-Based Augmentation in Low-Label Settings· slideslive

Taxonomy

TopicsMachine Learning and Data Classification · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis