ConTextTab: A Semantics-Aware Tabular In-Context Learner
Marco Spinaci, Marek Polewczyk, Maximilian Schambach, Sam Thelin

TL;DR
ConTextTab is a new semantics-aware tabular in-context learning model that combines semantic understanding with efficient architecture, trained on real-world data, achieving state-of-the-art results on multiple benchmarks.
Contribution
It introduces ConTextTab, integrating semantic understanding into a table-native ICL framework trained on real-world data, surpassing existing models in performance.
Findings
Achieves SOTA performance on multiple tabular benchmarks.
Sets a new standard on the CARTE benchmark.
Effectively combines semantic understanding with architecture efficiency.
Abstract
Tabular in-context learning (ICL) has recently achieved state-of-the-art (SOTA) performance on several tabular prediction tasks. Previously restricted to classification problems on small tables, recent advances such as TabPFN and TabICL have extended its use to larger datasets. Although current table-native ICL architectures are architecturally efficient and well-adapted to tabular data structures, their exclusive training on synthetic data limits their ability to fully leverage the rich semantics and world knowledge contained in real-world tabular data. At the other end of the spectrum, tabular ICL models based on pretrained large language models such as TabuLa-8B integrate deep semantic understanding and world knowledge but are only able to make use of a small amount of context due to inherent architectural limitations. With the aim to combine the best of both these worlds, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Machine Learning in Healthcare · Imbalanced Data Classification Techniques
Methodstabular data Prior-data Fitted Network · Sparse Evolutionary Training
