Benchmarking Recommendation, Classification, and Tracing Based on Hugging Face Knowledge Graph
Qiaosheng Chen, Kaijia Huang, Xiao Zhou, Weiqing Luo, Yuanning Cui, Gong Cheng

TL;DR
This paper introduces HuggingKG, a large-scale knowledge graph for ML resources from Hugging Face, and HuggingBench, a benchmark for recommendation, classification, and tracing tasks, enhancing IR research.
Contribution
It constructs the first extensive knowledge graph from Hugging Face resources and presents a multi-task benchmark for IR tasks, enabling advanced resource analysis.
Findings
HuggingKG contains 2.6 million nodes and 6.2 million edges.
HuggingBench provides three novel IR test collections.
Experiments show unique characteristics of HuggingKG and the benchmark.
Abstract
The rapid growth of open source machine learning (ML) resources, such as models and datasets, has accelerated IR research. However, existing platforms like Hugging Face do not explicitly utilize structured representations, limiting advanced queries and analyses such as tracing model evolution and recommending relevant datasets. To fill the gap, we construct HuggingKG, the first large-scale knowledge graph built from the Hugging Face community for ML resource management. With 2.6 million nodes and 6.2 million edges, HuggingKG captures domain-specific relations and rich textual attributes. It enables us to further present HuggingBench, a multi-task benchmark with three novel test collections for IR tasks including resource recommendation, classification, and tracing. Our experiments reveal unique characteristics of HuggingKG and the derived tasks. Both resources are publicly available,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
