SemRepo: A Knowledge Graph for Research Software and Its Scholarly Ecosystem
Abdul Rafay, Yuni Susanti, David Lamprecht, Michael F\"arber

TL;DR
SemRepo is a comprehensive RDF knowledge graph linking research software repositories with scholarly data, enabling integrated analysis of research artifacts, authors, and publications to support reproducibility and sustainability.
Contribution
It introduces SemRepo, a large-scale knowledge graph that unifies research software, scholarly profiles, and publications for advanced analysis.
Findings
Over 81 million triples describing 200,000 GitHub repositories
Links repositories to scholarly profiles, publications, and artifacts
Enables complex queries across research software and scholarly data
Abstract
We present SemRepo, an RDF knowledge graph comprising over 81 million triples describing nearly 200,000 GitHub repositories associated with scientific research. SemRepo captures repository-level metadata, such as contributors, issues, and programming languages, and interlinks this information with external scholarly knowledge graphs. In particular, repository authors are linked to their profiles in SemOpenAlex, repositories are connected to scholarly publications in LPWC, and research artifacts, such as datasets and experiments, are linked via MLSea-KG. This integration enables queries that span publications and their scholarly artifacts, which are typically fragmented across separate platforms. SemRepo supports analyses that are difficult to perform with existing resources in isolation, including provenance reconstruction across repositories and publications, as well as the systematic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
