Revisiting Deep Learning Models for Tabular Data
Yury Gorishniy, Ivan Rubachev, Valentin Khrulkov, Artem Babenko

TL;DR
This paper reviews deep learning models for tabular data, introduces two strong baseline architectures, and compares their performance with existing models and Gradient Boosted Decision Trees under standardized protocols.
Contribution
It identifies simple, effective deep architectures as strong baselines and provides a comprehensive comparison across multiple datasets with consistent protocols.
Findings
ResNet-like architecture is a strong baseline.
Transformer adaptation outperforms other models on most tasks.
No single model is universally best across all tabular datasets.
Abstract
The existing literature on deep learning for tabular data proposes a wide range of novel architectures and reports competitive results on various datasets. However, the proposed models are usually not properly compared to each other and existing works often use different benchmarks and experiment protocols. As a result, it is unclear for both researchers and practitioners what models perform best. Additionally, the field still lacks effective baselines, that is, the easy-to-use models that provide competitive performance across different problems. In this work, we perform an overview of the main families of DL architectures for tabular data and raise the bar of baselines in tabular DL by identifying two simple and powerful deep architectures. The first one is a ResNet-like architecture which turns out to be a strong baseline that is often missing in prior works. The second model is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning
MethodsAttention Is All You Need · Linear Layer · FT-Transformer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Dropout · Layer Normalization · Multi-Head Attention
