STRABLE: Benchmarking Tabular Machine Learning with Strings

Gioia Blayer; Myung Jun Kim; F\'elix Lefebvre; Lennart Purucker; Alan Arazi; Eilam Shapira; Roi Reichart; Frank Hutter; Marine Le Morvan; David Holzm\"uller; Ga\"el Varoquaux

arXiv:2605.12292·cs.LG·May 13, 2026

STRABLE: Benchmarking Tabular Machine Learning with Strings

Gioia Blayer, Myung Jun Kim, F\'elix Lefebvre, Lennart Purucker, Alan Arazi, Eilam Shapira, Roi Reichart, Frank Hutter, Marine Le Morvan, David Holzm\"uller, Ga\"el Varoquaux

PDF

1 Datasets

TL;DR

This paper introduces STRABLE, a comprehensive benchmark dataset of real-world tables with strings and numbers, and conducts a large-scale empirical study on various tabular learning pipelines involving string data.

Contribution

It provides the first extensive benchmark for string-including tabular data and evaluates diverse pipelines, revealing insights into effective methods for different types of string data.

Findings

01

Categorical-dominant tables are well served by simple encodings and advanced learners.

02

Large LLM encoders are competitive on free-text-dominant tables.

03

STRABLE enables generalizable pipeline rankings close to oracle rankings.

Abstract

Benchmarking tabular learning has revealed the benefit of dedicated architectures, pushing the state of the art. But real-world tables often contain string entries, beyond numbers, and these settings have been understudied due to a lack of a solid benchmarking suite. They lead to new research questions: Are dedicated learners needed, with end-to-end modeling of strings and numbers? Or does it suffice to encode strings as numbers, as with a categorical encoding? And if so, do the resulting tables resemble numerical tabular data, calling for the same learners? To enable these studies, we contribute STRABLE, a benchmarking corpus of 108 tables, all real-world learning problems with strings and numbers across diverse application fields. We run the first large-scale empirical study of tabular learning with strings, evaluating 445 pipelines. These pipelines span end-to-end architectures and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

inria-soda/STRABLE-benchmark
dataset· 1.3k dl
1.3k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.