ModelTables: A Corpus of Tables about Models
Zhengyuan Dong, Victor Zhong, Ren\'ee J. Miller

TL;DR
ModelTables is a large benchmark dataset of structured tables about AI models, enabling improved semantic retrieval and comparison of models through various search methods.
Contribution
This paper introduces ModelTables, the first large-scale benchmark of structured model-related tables, linking them to context and evaluating table search techniques.
Findings
Table-based dense retrieval achieves 66.5% P@1
Semantic table retrieval attains 54.8% P@1 overall
Hybrid retrieval methods show promising results
Abstract
We present ModelTables, a benchmark of tables in Model Lakes that captures the structured semantics of performance and configuration tables often overlooked by text only retrieval. The corpus is built from Hugging Face model cards, GitHub READMEs, and referenced papers, linking each table to its surrounding model and publication context. Compared with open data lake tables, model tables are smaller yet exhibit denser inter table relationships, reflecting tightly coupled model and benchmark evolution. The current release covers over 60K models and 90K tables. To evaluate model and table relatedness, we construct a multi source ground truth using three complementary signals: (1) paper citation links, (2) explicit model card links and inheritance, and (3) shared training datasets. We present one extensive empirical use case for the benchmark which is table search. We compare canonical Data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data Visualization and Analytics · Research Data Management Practices
