ModelTables: A Corpus of Tables about Models

Zhengyuan Dong; Victor Zhong; Ren\'ee J. Miller

arXiv:2512.16106·cs.DB·December 19, 2025

ModelTables: A Corpus of Tables about Models

Zhengyuan Dong, Victor Zhong, Ren\'ee J. Miller

PDF

Open Access

TL;DR

ModelTables is a large benchmark dataset of structured tables about AI models, enabling improved semantic retrieval and comparison of models through various search methods.

Contribution

This paper introduces ModelTables, the first large-scale benchmark of structured model-related tables, linking them to context and evaluating table search techniques.

Findings

01

Table-based dense retrieval achieves 66.5% P@1

02

Semantic table retrieval attains 54.8% P@1 overall

03

Hybrid retrieval methods show promising results

Abstract

We present ModelTables, a benchmark of tables in Model Lakes that captures the structured semantics of performance and configuration tables often overlooked by text only retrieval. The corpus is built from Hugging Face model cards, GitHub READMEs, and referenced papers, linking each table to its surrounding model and publication context. Compared with open data lake tables, model tables are smaller yet exhibit denser inter table relationships, reflecting tightly coupled model and benchmark evolution. The current release covers over 60K models and 90K tables. To evaluate model and table relatedness, we construct a multi source ground truth using three complementary signals: (1) paper citation links, (2) explicit model card links and inheritance, and (3) shared training datasets. We present one extensive empirical use case for the benchmark which is table search. We compare canonical Data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Data Visualization and Analytics · Research Data Management Practices