Fine-tuned In-Context Learning Transformers are Excellent Tabular Data   Classifiers

Felix den Breejen; Sangmin Bae; Stephen Cha; Se-Young Yun

arXiv:2405.13396·cs.LG·January 24, 2025

Fine-tuned In-Context Learning Transformers are Excellent Tabular Data Classifiers

Felix den Breejen, Sangmin Bae, Stephen Cha, Se-Young Yun

PDF

Open Access 1 Repo 2 Models

TL;DR

This paper enhances ICL-transformers for tabular data classification by fine-tuning, introducing a new pretraining dataset generator, and combining datasets to improve both fine-tuning and zero-shot performance.

Contribution

It extends TabPFN to fine-tuning, proposes a new dataset generator for complex decision boundaries, and combines datasets for improved overall performance.

Findings

01

Fine-tuning significantly boosts ICL-transformer performance.

02

Pretraining on complex datasets improves fine-tuning results.

03

Combining dataset generators yields state-of-the-art performance.

Abstract

The recently introduced TabPFN pretrains an In-Context Learning (ICL) transformer on synthetic data to perform tabular data classification. In this work, we extend TabPFN to the fine-tuning setting, resulting in a significant performance boost. We also discover that fine-tuning enables ICL-transformers to create complex decision boundaries, a property regular neural networks do not have. Based on this observation, we propose to pretrain ICL-transformers on a new forest dataset generator which creates datasets that are unrealistic, but have complex decision boundaries. TabForest, the ICL-transformer pretrained on this dataset generator, shows better fine-tuning performance when pretrained on more complex datasets. Additionally, TabForest outperforms TabPFN on some real-world datasets when fine-tuning, despite having lower zero-shot performance due to the unrealistic nature of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

felixdenbreejen/tabforestpfn
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications

Methodstabular data Prior-data Fitted Network