Construction of the Literature Graph in Semantic Scholar
Waleed Ammar, Dirk Groeneveld, Chandra Bhagavatula, Iz Beltagy, Miles, Crawford, Doug Downey, Jason Dunkelberger, Ahmed Elgohary, Sergey Feldman, Vu, Ha, Rodney Kinney, Sebastian Kohlmeier, Kyle Lo, Tyler Murray, Hsu-Han Ooi,, Matthew Peters, Joanna Power, Sam Skjonsberg

TL;DR
This paper presents a scalable system that constructs a comprehensive literature graph from scientific publications, integrating papers, authors, and entities to enhance information retrieval and discovery in Semantic Scholar.
Contribution
It introduces a large-scale, heterogeneous literature graph and adapts NLP tasks for its construction, addressing unique challenges in scientific data processing.
Findings
Constructed a graph with over 280 million nodes
Achieved effective entity extraction and linking in scientific texts
Enabled semantic features in Semantic Scholar platform
Abstract
We describe a deployed scalable system for organizing published scientific literature into a heterogeneous graph to facilitate algorithmic manipulation and discovery. The resulting literature graph consists of more than 280M nodes, representing papers, authors, entities and various interactions between them (e.g., authorships, citations, entity mentions). We reduce literature graph construction into familiar NLP tasks (e.g., entity extraction and linking), point out research challenges due to differences from standard formulations of these tasks, and report empirical results for each task. The methods described in this paper are used to enable semantic features in www.semanticscholar.org
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Semantic Web and Ontologies · Biomedical Text Mining and Ontologies
