TabPFN: A Transformer That Solves Small Tabular Classification Problems in a Second
Noah Hollmann, Samuel M\"uller, Katharina Eggensperger, Frank Hutter

TL;DR
TabPFN is a Transformer-based model that rapidly and accurately performs small tabular classification tasks without hyperparameter tuning, matching state-of-the-art methods while being significantly faster.
Contribution
We introduce TabPFN, a trained Transformer that performs in-context learning for small tabular datasets, offering a fast, hyperparameter-free alternative to existing methods.
Findings
Outperforms boosted trees on small datasets
Matches state-of-the-art AutoML performance
Achieves up to 230x speedup, 5700x with GPU
Abstract
We present TabPFN, a trained Transformer that can do supervised classification for small tabular datasets in less than a second, needs no hyperparameter tuning and is competitive with state-of-the-art classification methods. TabPFN performs in-context learning (ICL), it learns to make predictions using sequences of labeled examples (x, f(x)) given in the input, without requiring further parameter updates. TabPFN is fully entailed in the weights of our network, which accepts training and test samples as a set-valued input and yields predictions for the entire test set in a single forward pass. TabPFN is a Prior-Data Fitted Network (PFN) and is trained offline once, to approximate Bayesian inference on synthetic datasets drawn from our prior. This prior incorporates ideas from causal reasoning: It entails a large space of structural causal models with a preference for simple structures.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning and Data Classification · Explainable Artificial Intelligence (XAI) · Bayesian Modeling and Causal Inference
MethodsMulti-Head Attention · Attention Is All You Need · tabular data Prior-data Fitted Network · Test · Linear Layer · Label Smoothing · Adam · Dense Connections · Absolute Position Encodings · Position-Wise Feed-Forward Layer
