Transformers Boost the Performance of Decision Trees on Tabular Data across Sample Sizes
Mayuka Jayawardhana, Renbo, Samuel Dooley, Valeriia Cherepanova,, Andrew Gordon Wilson, Frank Hutter, Colin White, Tom Goldstein, Micah, Goldblum

TL;DR
This paper introduces LLM-Boost and PFN-Boost, simple fusion methods that combine large language models and TabPFN with gradient-boosted decision trees, improving performance across various dataset sizes on tabular data.
Contribution
The paper presents a novel fusion approach that leverages the strengths of transformers and GBDTs, achieving state-of-the-art results on tabular datasets of varying sizes.
Findings
PFN-Boost achieves the best average performance across datasets.
Fusion methods outperform standalone models on intermediate dataset sizes.
State-of-the-art results against multiple baselines and ensembling methods.
Abstract
Large language models (LLMs) perform remarkably well on tabular datasets in zero- and few-shot settings, since they can extract meaning from natural language column headers that describe features and labels. Similarly, TabPFN, a recent non-LLM transformer pretrained on numerous tables for in-context learning, has demonstrated excellent performance for dataset sizes up to a thousand samples. In contrast, gradient-boosted decision trees (GBDTs) are typically trained from scratch on each dataset without benefiting from pretraining data and must learn the relationships between columns from their entries alone since they lack natural language understanding. LLMs and TabPFN excel on small tabular datasets where a strong prior is essential, yet they are not competitive with GBDTs on medium or large datasets, since their context lengths are limited. In this paper, we propose a simple and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
Methodstabular data Prior-data Fitted Network
