Overview of STEM Science as Process, Method, Material, and Data Named Entities
Jennifer D'Souza

TL;DR
This paper introduces a large-scale, multidisciplinary dataset of STEM article abstracts annotated with four domain-independent scientific entities, enabling analysis and visualization of scientific knowledge across disciplines.
Contribution
It presents the creation and analysis of the STEM-NER-60k dataset, a large, publicly available corpus with structured annotations for four scientific entity types across ten STEM fields.
Findings
First large-scale analysis of multidisciplinary scientific entities
Domain-independent entity labels effectively characterize diverse STEM fields
Word cloud visualizations summarize key facets of scientific knowledge
Abstract
We are faced with an unprecedented production in scholarly publications worldwide. Stakeholders in the digital libraries posit that the document-based publishing paradigm has reached the limits of adequacy. Instead, structured, machine-interpretable, fine-grained scholarly knowledge publishing as Knowledge Graphs (KG) is strongly advocated. In this work, we develop and analyze a large-scale structured dataset of STEM articles across 10 different disciplines, viz. Agriculture, Astronomy, Biology, Chemistry, Computer Science, Earth Science, Engineering, Material Science, Mathematics, and Medicine. Our analysis is defined over a large-scale corpus comprising 60K abstracts structured as four scientific entities process, method, material, and data. Thus our study presents, for the first-time, an analysis of a large-scale multidisciplinary corpus under the construct of four named entity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Biomedical Text Mining and Ontologies · Scientific Computing and Data Management
