PubGraph: A Large-Scale Scientific Knowledge Graph
Kian Ahrabian, Xinwei Du, Richard Delwin Myloth, Arun Baalaaji Sankar, Ananthan, Jay Pujara

TL;DR
PubGraph is a comprehensive large-scale scientific knowledge graph integrating data from multiple sources, supporting advanced reasoning and benchmarking tasks to facilitate scientific progress analysis.
Contribution
We introduce PubGraph, a large-scale, unified scientific knowledge graph with extensive metadata, auxiliary data, and benchmarks for knowledge graph completion tasks.
Findings
PubGraph contains over 385 million entities and 13 billion main edges.
It includes auxiliary data from community detection algorithms and language models.
New benchmarks challenge existing knowledge graph embedding models.
Abstract
Research publications are the primary vehicle for sharing scientific progress in the form of new discoveries, methods, techniques, and insights. Unfortunately, the lack of a large-scale, comprehensive, and easy-to-use resource capturing the myriad relationships between publications, their authors, and venues presents a barrier to applications for gaining a deeper understanding of science. In this paper, we present PubGraph, a new resource for studying scientific progress that takes the form of a large-scale knowledge graph (KG) with more than 385M entities, 13B main edges, and 1.5B qualifier edges. PubGraph is comprehensive and unifies data from various sources, including Wikidata, OpenAlex, and Semantic Scholar, using the Wikidata ontology. Beyond the metadata available from these sources, PubGraph includes outputs from auxiliary community detection algorithms and large language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Topic Modeling · Biomedical Text Mining and Ontologies
