TabICLv2: A better, faster, scalable, and open tabular foundation model
Jingang Qu, David Holzm\"uller, Ga\"el Varoquaux, Marine Le Morvan

TL;DR
TabICLv2 is a new state-of-the-art tabular foundation model that achieves superior performance and scalability through innovative synthetic data generation, architectural improvements, and optimized training protocols, outperforming existing models on key benchmarks.
Contribution
Introduces TabICLv2 with novel synthetic data, scalable attention, and optimized pretraining, setting new performance and speed benchmarks for tabular data modeling.
Findings
Outperforms state-of-the-art models on TabArena and TALENT benchmarks.
Generalizes effectively to million-scale datasets with moderate compute.
Faster and more scalable than previous models like RealTabPFN-2.5.
Abstract
Tabular foundation models, such as TabPFNv2 and TabICL, have recently dethroned gradient-boosted trees at the top of predictive benchmarks, demonstrating the value of in-context learning for tabular data. We introduce TabICLv2, a new state-of-the-art foundation model for regression and classification built on three pillars: (1) a novel synthetic data generation engine designed for high pretraining diversity; (2) various architectural innovations, including a new scalable softmax in attention improving generalization to larger datasets without prohibitive long-sequence pretraining; and (3) optimized pretraining protocols, notably replacing AdamW with the Muon optimizer. On the TabArena and TALENT benchmarks, TabICLv2 without any tuning surpasses the performance of the current state of the art, RealTabPFN-2.5 (hyperparameter-tuned, ensembled, and fine-tuned on real data). With only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Advanced Neural Network Applications · Explainable Artificial Intelligence (XAI)
