TabICL: A Tabular Foundation Model for In-Context Learning on Large Data

Jingang Qu; David Holzm\"uller; Ga\"el Varoquaux; Marine Le Morvan

arXiv:2502.05564·cs.LG·May 27, 2025·5 cites

TabICL: A Tabular Foundation Model for In-Context Learning on Large Data

Jingang Qu, David Holzm\"uller, Ga\"el Varoquaux, Marine Le Morvan

PDF

Open Access 4 Models

TL;DR

TabICL introduces a scalable tabular foundation model that leverages in-context learning with a novel architecture, enabling efficient handling of large datasets and outperforming existing models on various classification benchmarks.

Contribution

The paper presents a new two-stage architecture for tabular foundation models that significantly improves scalability and performance in in-context learning on large datasets.

Findings

01

TabICL is faster than previous models by up to 10 times.

02

It performs on par with TabPFNv2 on 200 datasets.

03

It surpasses TabPFNv2 and CatBoost on datasets with over 10K samples.

Abstract

The long-standing dominance of gradient-boosted decision trees on tabular data is currently challenged by tabular foundation models using In-Context Learning (ICL): setting the training data as context for the test data and predicting in a single forward pass without parameter updates. While TabPFNv2 foundation model excels on tables with up to 10K samples, its alternating column- and row-wise attentions make handling large training sets computationally prohibitive. So, can ICL be effectively scaled and deliver a benefit for larger tables? We introduce TabICL, a tabular foundation model for classification, pretrained on synthetic datasets with up to 60K samples and capable of handling 500K samples on affordable resources. This is enabled by a novel two-stage architecture: a column-then-row attention mechanism to build fixed-dimensional embeddings of rows, followed by a transformer for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Machine Learning in Healthcare

MethodsSoftmax · Attention Is All You Need