The Semantic Scholar Open Data Platform
Rodney Kinney, Chloe Anastasiades, Russell Authur, Iz Beltagy,, Jonathan Bragg, Alexandra Buraczynski, Isabel Cachola, Stefan Candra,, Yoganand Chandrasekhar, Arman Cohan, Miles Crawford, Doug Downey, Jason, Dunkelberger, Oren Etzioni, Rob Evans, Sergey Feldman, Joseph Gorney

TL;DR
Semantic Scholar's open data platform leverages advanced extraction and knowledge graph techniques to create the largest open scientific literature graph, facilitating discovery and understanding in scientific research.
Contribution
This paper introduces the Semantic Scholar Academic Graph, combining diverse data sources with state-of-the-art extraction and semantic features, and details the platform's data processing pipeline and APIs.
Findings
Built the largest open scientific literature graph with 200M+ papers
Integrated advanced semantic features like text parsing and embeddings
Provided comprehensive APIs for data access and analysis
Abstract
The volume of scientific output is creating an urgent need for automated tools to help scientists keep up with developments in their field. Semantic Scholar (S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature. We combine public and proprietary data sources using state-of-the-art techniques for scholarly PDF content extraction and automatic knowledge graph construction to build the Semantic Scholar Academic Graph, the largest open scientific literature graph to-date, with 200M+ papers, 80M+ authors, 550M+ paper-authorship edges, and 2.4B+ citation edges. The graph includes advanced semantic features such as structurally parsed text, natural language summaries, and vector embeddings. In this paper, we describe the components of the S2 data processing pipeline and the associated APIs offered by the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Data Quality and Management · Topic Modeling
