ArtifactLinker: Linking Scientific Artifacts for Automatic State-of-the-Art Discovery

Haofei Yu; Jiaxuan You; Peter Clark; Bodhisattwa Prasad Majumder; Kyle Richardson

arXiv:2605.16902·cs.LG·May 19, 2026

ArtifactLinker: Linking Scientific Artifacts for Automatic State-of-the-Art Discovery

Haofei Yu, Jiaxuan You, Peter Clark, Bodhisattwa Prasad Majumder, Kyle Richardson

PDF

TL;DR

ArtifactLinker is a framework that leverages graph neural networks and large language models to automatically discover the best models for datasets by analyzing artifacts on platforms like HuggingFace.

Contribution

It introduces a novel two-stage approach combining ranking and verification for automatic SOTA discovery using artifact graphs and LLMs.

Findings

01

Graph structures effectively predict missing links between artifacts.

02

ArtifactLinker successfully discovers potential SOTA models and insights.

03

The ArtifactBench benchmark enables evaluation of artifact link prediction methods.

Abstract

Scientific artifacts such as models and datasets are foundations for research. With the rapid growth of platforms like HuggingFace, researchers now have access to a large number of artifacts. Yet, a key challenge remains: how can we automatically discover the state-of-the-art (SOTA) model for a given dataset by fully leveraging existing artifacts? We formalize this task as automatic SOTA discovery by modeling HuggingFace as an artifact graph, where nodes are models/datasets and edges represent evaluations. We propose ArtifactLinker, a two-stage framework: (1) ranking promising unobserved model--dataset links using Graph Neural Networks (GNNs) or graph-augmented Large Language Models (LLMs), and (2) verifying top-ranked links via coding experiments with LLM-based agents. We further introduce a benchmark named ArtifactBench with 14,053 artifacts and 51,337 relations to evaluate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.