Mixture of In-Context Prompters for Tabular PFNs
Derek Xu, Olcay Cirit, Reza Asadi, Yizhou Sun, Wei Wang

TL;DR
MIXTUREPFN introduces a novel method combining in-context learning with bootstrapping for tabular data, outperforming existing models on diverse datasets by addressing scalability and performance issues.
Contribution
It extends nearest-neighbor sampling and bootstrapping techniques to improve in-context learning for large tabular datasets, achieving state-of-the-art results.
Findings
Outperforms 19 strong baselines across 36 datasets
Achieves statistically significant highest mean rank
Effectively scales to larger datasets without performance loss
Abstract
Recent benchmarks found In-Context Learning (ICL) outperforms both deep learning and tree-based algorithms on small tabular datasets. However, on larger datasets, ICL for tabular learning cannot run without severely compromising performance, due to its quadratic space and time complexity w.r.t. dataset size. We propose MIXTUREPFN, which both extends nearest-neighbor sampling to the state-of-the-art ICL for tabular learning model and uses bootstrapping to finetune said model on the inference-time dataset. MIXTUREPFN is the Condorcet winner across 36 diverse tabular datasets against 19 strong deep learning and tree-based baselines, achieving the highest mean rank among Top-10 aforementioned algorithms with statistical significance.
Peer Reviews
Decision·ICLR 2025 Poster
The paper is well written and proposes a justified solution to address the context length issue for in-context learning models such as TabPFN. Authors conduct extensive experiments on many real world dataset to demonstrate the effectiveness of the proposed approach and compare with leading tree-based and deep learning tabular methods.
There is a very related previous work "Retrieval & Fine-Tuning for In-Context Tabular Models" by Thomas et al, which proposes both nearest neighbor retrieval to improve the prompt and fine tuning with this approach to adapt the model to the target distribution. I think the authors have to compare with this work and highlight what is novel in MixturePFN.
- The MICP strategy effectively reduces memory usage, allowing the model to handle larger datasets compared to existing TabPFN - CAPFN bootstrapping and finetuning approach appears to be an effective way to mitigate distribution shift ICL for tabular data - Extensive benchmarks against 19 strong baselines show good performance in both mean rank and Condorcet ranking across diverse datasets
- While MIXTUREPFN improves dataset scalability, it still struggles with feature-rich datasets, potentially limiting its applicability in domains with high-dimensional data, such as patient healthcare data. I realize the authors leave this to future work, but this is an area where simple XGBoost performs quite well, and I would be curious about their thoughts on tackling this issue. - MICP's reliance on K-Means clustering to segment data into meaningful clusters as the quality of clusters can v
1. The idea of Mixture of Experts blending into TabPFN seems novel. 2. The effectiveness of MixturePFN is well evaluated in well-established benchmarks against a variety of baseline methods. 3. Writing is easy to follow.
1. The biggest weakness I think is that the paper is missing a comparison with LoCalPFN [1]. Since LoCalPFN also tries to make TabPFN effective even on datasets with many-shots, I think it should be mentioned in the paper. ---- [1] Thomas et al., Retrieval & Fine-Tuning for In-Context Tabular Models, NeurIPS 2024
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHigh voltage insulation and dielectric phenomena
