Squeezing Lemons with Hammers: An Evaluation of AutoML and Tabular Deep   Learning for Data-Scarce Classification Applications

Ricardo Knauer; Erik Rodner

arXiv:2405.07662·cs.LG·May 14, 2024

Squeezing Lemons with Hammers: An Evaluation of AutoML and Tabular Deep Learning for Data-Scarce Classification Applications

Ricardo Knauer, Erik Rodner

PDF

Open Access

TL;DR

This paper evaluates the performance of simple logistic regression versus complex AutoML and deep learning methods on small tabular datasets, finding similar results and recommending logistic regression as a first approach.

Contribution

It provides a comprehensive comparison of simple and complex models on small tabular datasets, highlighting the effectiveness of logistic regression in data-scarce scenarios.

Findings

01

Logistic regression performs similarly to AutoML and deep learning methods on small datasets.

02

Complex methods do not significantly outperform simple models in low-data regimes.

03

Practitioners should consider logistic regression as a baseline for small tabular data classification.

Abstract

Many industry verticals are confronted with small-sized tabular data. In this low-data regime, it is currently unclear whether the best performance can be expected from simple baselines, or more complex machine learning approaches that leverage meta-learning and ensembling. On 44 tabular classification datasets with sample sizes $\leq$ 500, we find that L2-regularized logistic regression performs similar to state-of-the-art automated machine learning (AutoML) frameworks (AutoPrognosis, AutoGluon) and off-the-shelf deep neural networks (TabPFN, HyperFast) on the majority of the benchmark datasets. We therefore recommend to consider logistic regression as the first choice for data-scarce applications with tabular data and provide practitioners with best practices for further method selection.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification

MethodsLogistic Regression