SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents
Qi Zhang, Zhijia Chen, Huitong Pan, Cornelia Caragea, Longin Jan, Latecki, Eduard Dragut

TL;DR
This paper introduces a comprehensive dataset for scientific entity and relation extraction, covering full-text articles with detailed annotations, to advance the development of more effective SciIE models.
Contribution
It provides a new, large-scale, full-text scientific dataset with fine-grained annotations and an out-of-distribution test set, enabling more realistic evaluation of SciIE models.
Findings
State-of-the-art models perform poorly on the dataset.
LLM-based baselines show promising results but still face challenges.
The dataset reveals complexities in extracting scientific entities and relations.
Abstract
Scientific information extraction (SciIE) is critical for converting unstructured knowledge from scholarly articles into structured data (entities and relations). Several datasets have been proposed for training and validating SciIE models. However, due to the high complexity and cost of annotating scientific texts, those datasets restrict their annotations to specific parts of paper, such as abstracts, resulting in the loss of diverse entity mentions and relations in context. In this paper, we release a new entity and relation extraction dataset for entities related to datasets, methods, and tasks in scientific articles. Our dataset contains 106 manually annotated full-text scientific publications with over 24k entities and 12k relations. To capture the intricate use and interactions among entities in full texts, our dataset contains a fine-grained tag set for relations. Additionally,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSemantic Web and Ontologies · Topic Modeling · Scientific Computing and Data Management
MethodsSparse Evolutionary Training
