SemOpenAlex: The Scientific Landscape in 26 Billion RDF Triples
Michael F\"arber, David Lamprecht, Johan Krause, Linn Aung, Peter, Haase

TL;DR
SemOpenAlex is a comprehensive, openly accessible RDF knowledge graph with over 26 billion triples, enabling advanced scientific data analysis, semantic search, and recommender systems across disciplines.
Contribution
It introduces SemOpenAlex, the largest open scientific knowledge graph with extensive data access methods and applications for analytics, recommendation, and benchmarking.
Findings
Supports large-scale scientific impact analysis
Enables semantic search and scholarly recommendations
Serves as a benchmark for RDF query optimization
Abstract
We present SemOpenAlex, an extensive RDF knowledge graph that contains over 26 billion triples about scientific publications and their associated entities, such as authors, institutions, journals, and concepts. SemOpenAlex is licensed under CC0, providing free and open access to the data. We offer the data through multiple channels, including RDF dump files, a SPARQL endpoint, and as a data source in the Linked Open Data cloud, complete with resolvable URIs and links to other data sources. Moreover, we provide embeddings for knowledge graph entities using high-performance computing. SemOpenAlex enables a broad range of use-case scenarios, such as exploratory semantic search via our website, large-scale scientific impact quantification, and other forms of scholarly big data analytics within and across scientific disciplines. Additionally, it enables academic recommender systems, such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Biomedical Text Mining and Ontologies · Topic Modeling
