Towards Universal Tabular Embeddings: A Benchmark Across Data Tasks
Liane Vogel, Kavitha Srinivas, Niharika D'Souza, Sola Shirai, Oktie Hassanzadeh, Horst Samulowitz

TL;DR
This paper introduces TEmBed, a comprehensive benchmark for evaluating tabular data embeddings across multiple levels, providing practical guidance for model selection and advancing towards universal tabular representations.
Contribution
It presents a systematic benchmark for comparing tabular embedding models across different levels and tasks, addressing the lack of standardized evaluation.
Findings
Model performance varies depending on task and representation level.
The benchmark reveals strengths and weaknesses of diverse models.
Guides practical selection of tabular embeddings for real-world applications.
Abstract
Tabular foundation models aim to learn universal representations of tabular data that transfer across tasks and domains, enabling applications such as table retrieval, semantic search and table-based prediction. Despite the growing number of such models, it remains unclear which approach works best in practice, as existing methods are often evaluated under task-specific settings that make direct comparison difficult. To address this, we introduce TEmBed, the Tabular Embedding Test Bed, a comprehensive benchmark for systematically evaluating tabular embeddings across four representation levels: cell, row, column, and table. Evaluating a diverse set of tabular representation learning models, we show that which model to use depends on the task and representation level. Our results offer practical guidance for selecting tabular embeddings in real-world applications and lay the groundwork…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
