Retrieval & Fine-Tuning for In-Context Tabular Models
Valentin Thomas, Junwei Ma, Rasa Hosseinzadeh, Keyvan Golestan,, Guangwei Yu, Maksims Volkovs, Anthony Caterini

TL;DR
This paper introduces LoCalPFN, a method combining retrieval and fine-tuning to adapt transformer-based tabular models to local data subsets, significantly improving performance on diverse datasets.
Contribution
The paper proposes a novel retrieval and fine-tuning scheme for transformer-based tabular models, achieving state-of-the-art results on 95 datasets and advancing deep learning for tabular data.
Findings
LoCalPFN outperforms previous models on 95 datasets.
Retrieval and fine-tuning significantly boost model performance.
The approach surpasses tuned tree-based models.
Abstract
Tabular data is a pervasive modality spanning a wide range of domains, and the inherent diversity poses a considerable challenge for deep learning. Recent advancements using transformer-based in-context learning have shown promise on smaller and less complex datasets, but have struggled to scale to larger and more complex ones. To address this limitation, we propose a combination of retrieval and fine-tuning: we can adapt the transformer to a local subset of the data by collecting nearest neighbours, and then perform task-specific fine-tuning with this retrieved set of neighbours in context. Using TabPFN as the base model -- currently the best tabular in-context learner -- and applying our retrieval and fine-tuning scheme on top results in what we call a locally-calibrated PFN, or LoCalPFN. We conduct extensive evaluation on 95 datasets curated by TabZilla from OpenML, upon which we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTime Series Analysis and Forecasting · Human Pose and Action Recognition · Anomaly Detection Techniques and Applications
MethodsSparse Evolutionary Training · Balanced Selection · tabular data Prior-data Fitted Network
