Mixture of In-Context Prompters for Tabular PFNs

Derek Xu; Olcay Cirit; Reza Asadi; Yizhou Sun; Wei Wang

arXiv:2405.16156·cs.LG·May 28, 2024·1 cites

Mixture of In-Context Prompters for Tabular PFNs

Derek Xu, Olcay Cirit, Reza Asadi, Yizhou Sun, Wei Wang

PDF

Open Access 3 Reviews

TL;DR

MIXTUREPFN introduces a novel method combining in-context learning with bootstrapping for tabular data, outperforming existing models on diverse datasets by addressing scalability and performance issues.

Contribution

It extends nearest-neighbor sampling and bootstrapping techniques to improve in-context learning for large tabular datasets, achieving state-of-the-art results.

Findings

01

Outperforms 19 strong baselines across 36 datasets

02

Achieves statistically significant highest mean rank

03

Effectively scales to larger datasets without performance loss

Abstract

Recent benchmarks found In-Context Learning (ICL) outperforms both deep learning and tree-based algorithms on small tabular datasets. However, on larger datasets, ICL for tabular learning cannot run without severely compromising performance, due to its quadratic space and time complexity w.r.t. dataset size. We propose MIXTUREPFN, which both extends nearest-neighbor sampling to the state-of-the-art ICL for tabular learning model and uses bootstrapping to finetune said model on the inference-time dataset. MIXTUREPFN is the Condorcet winner across 36 diverse tabular datasets against 19 strong deep learning and tree-based baselines, achieving the highest mean rank among Top-10 aforementioned algorithms with statistical significance.

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 2

Strengths

The paper is well written and proposes a justified solution to address the context length issue for in-context learning models such as TabPFN. Authors conduct extensive experiments on many real world dataset to demonstrate the effectiveness of the proposed approach and compare with leading tree-based and deep learning tabular methods.

Weaknesses

There is a very related previous work "Retrieval & Fine-Tuning for In-Context Tabular Models" by Thomas et al, which proposes both nearest neighbor retrieval to improve the prompt and fine tuning with this approach to adapt the model to the target distribution. I think the authors have to compare with this work and highlight what is novel in MixturePFN.

Reviewer 02Rating 8Confidence 2

Strengths

- The MICP strategy effectively reduces memory usage, allowing the model to handle larger datasets compared to existing TabPFN - CAPFN bootstrapping and finetuning approach appears to be an effective way to mitigate distribution shift ICL for tabular data - Extensive benchmarks against 19 strong baselines show good performance in both mean rank and Condorcet ranking across diverse datasets

Weaknesses

- While MIXTUREPFN improves dataset scalability, it still struggles with feature-rich datasets, potentially limiting its applicability in domains with high-dimensional data, such as patient healthcare data. I realize the authors leave this to future work, but this is an area where simple XGBoost performs quite well, and I would be curious about their thoughts on tackling this issue. - MICP's reliance on K-Means clustering to segment data into meaningful clusters as the quality of clusters can v

Reviewer 03Rating 6Confidence 3

Strengths

1. The idea of Mixture of Experts blending into TabPFN seems novel. 2. The effectiveness of MixturePFN is well evaluated in well-established benchmarks against a variety of baseline methods. 3. Writing is easy to follow.

Weaknesses

1. The biggest weakness I think is that the paper is missing a comparison with LoCalPFN [1]. Since LoCalPFN also tries to make TabPFN effective even on datasets with many-shots, I think it should be mentioned in the paper. ---- [1] Thomas et al., Retrieval & Fine-Tuning for In-Context Tabular Models, NeurIPS 2024

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHigh voltage insulation and dielectric phenomena