Table Foundation Models: on knowledge pre-training for tabular learning
Myung Jun Kim, F\'elix Lefebvre, Ga\"etan Brison, Alexandre Perez-Lebel, and Ga\"el Varoquaux

TL;DR
This paper introduces TARTE, a knowledge-enhanced foundation model for tabular data that improves prediction accuracy and efficiency by providing versatile, pre-trained representations capable of fine-tuning and domain specialization.
Contribution
TARTE is a novel pre-trained foundation model for tables that captures semantics through string-based representations, enabling better downstream performance and reusability.
Findings
TARTE improves state-of-the-art prediction accuracy on tabular tasks.
Pre-trained representations facilitate efficient downstream learning.
TARTE enables domain-specific adaptation with minimal additional training.
Abstract
Table foundation models bring high hopes to data science: pre-trained on tabular data to embark knowledge or priors, they should facilitate downstream tasks on tables. One specific challenge is that of data semantics: numerical entries take their meaning from context, e.g., column name. Pre-trained neural networks that jointly model column names and table entries have recently boosted prediction accuracy. While these models outline the promises of world knowledge to interpret table values, they lack the convenience of popular foundation models in text or vision. Indeed, they must be fine-tuned to bring benefits, come with sizeable computation costs, and cannot easily be reused or combined with other architectures. Here we introduce TARTE, a foundation model that transforms tables to knowledge-enhanced vector representations using the string to capture semantics. Pre-trained on large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data Visualization and Analytics · Computational and Text Analysis Methods
