Tabular Data: Is Deep Learning all you need?

Guri Zab\"ergja; Arlind Kadra; Christian M. M. Frey; Josif Grabocka

arXiv:2402.03970·cs.LG·October 7, 2025·2 cites

Tabular Data: Is Deep Learning all you need?

Guri Zab\"ergja, Arlind Kadra, Christian M. M. Frey, Josif Grabocka

PDF

Open Access 3 Reviews

TL;DR

This paper benchmarks recent deep learning models against classical ML methods on tabular data, revealing that deep learning can outperform traditional approaches in diverse real-world datasets.

Contribution

It provides a comprehensive empirical comparison of 17 methods, including neural networks and classical ML, on 68 datasets, highlighting a potential paradigm shift.

Findings

01

Deep learning methods outperform classical approaches on tabular data.

02

Neural networks show competitive performance across diverse datasets.

03

The study offers a benchmark for future research in tabular data modeling.

Abstract

Tabular data represent one of the most prevalent data formats in applied machine learning, largely because they accommodate a broad spectrum of real-world problems. Existing literature has studied many of the shortcomings of neural architectures on tabular data and has repeatedly confirmed the scalability and robustness of gradient-boosted decision trees across varied datasets. However, recent deep learning models have not been subjected to a comprehensive evaluation under conditions that allow for a fair comparison with existing classical approaches. This situation motivates an investigation into whether recent deep-learning paradigms outperform classical ML methods on tabular data. Our survey fills this gap by benchmarking seventeen state-of-the-art methods, spanning neural networks, classical ML and AutoML techniques. Our empirical results over 68 diverse datasets from a…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

The first (broader) strength and argument for the paper is in it's ovearall message and survey-ish nature. The field of Tabular Deep Learning did progress in recent years and the paper manages to convey this message. I think it may be important for the broader DL community to know about the subfield advances. Another strong aspect (much less important in my view) is in more subtle findings that are novel: - RealMLP performance seems to be much less good compared to it's resulst on TabArena - th

Weaknesses

The core weakness of this work is in lacking the field context. By lacking context I mean that the paper for it's main goal (that seems to be conveying the message of progress in DL for tabular data), misses a bit on where the field is actually at. First, I think that recent focus on dataset quality in benchmarks brought up in recent work ([Erickson et al.](https://arxiv.org/abs/2506.16791), [Rubachev et al.](https://arxiv.org/abs/2406.19380), [Tschalzev et al.](https://arxiv.org/abs/2503.09159

Reviewer 02Rating 2Confidence 5

Strengths

The fact that refitting (after performing hyperparameter optimization) is more beneficial for GBDTs than for DNNs is interesting and new to me, at least I have never met it in the literature.

Weaknesses

(1) The paper investigates only classification problems and it is not clear from the title/abstract/contributions bullet list. The final claim in the abstract can be misleading, for instance, "deep learning methods outperform classical approaches" can be false for regression, see the recent TabArena leaderboard for regression. (2) The chosen wording can also be misleading. For instance, the claim "nonfinetuned foundation models outperform fine-tuned ones" can be unclear, since rigorously speaki

Reviewer 03Rating 2Confidence 5

Strengths

- The manuscript addresses a central challenge in tabular learning e.g. fairly benchmarking ML and DL methods on tabular datasets - It provides valuable insights for practitioners, particularly regarding training strategies and hyperparameter sensitivity. - The paper is clearly written and easy to follow.

Weaknesses

The primary weakness is limited novelty. The TabArena work [1] already provides: (1) a large-scale, reproducible benchmarking ecosystem, (2) a live leaderboard that can be continuously updated, (3) a carefully curated dataset collection, and (4) strong baselines with advanced evaluation protocols. Moreover, TabArena reports similar empirical conclusions—for example, that DL methods can outperform classical methods in certain regimes. As a result, it is unclear what unique contribution this paper

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics · Computational Physics and Python Applications