GSAP-ERE: Fine-Grained Scholarly Entity and Relation Extraction Focused on Machine Learning
Wolfgang Otto, Lu Gan, Sharmila Upadhyaya, Saurav Karmakar, Stefan Dietze

TL;DR
GSAP-ERE is a fine-grained dataset for extracting scholarly entities and relations in ML research, enabling improved knowledge graph construction and reproducibility analysis, and highlighting the superiority of fine-tuned models over LLM prompting.
Contribution
Introduces a manually curated dataset with detailed entity and relation types for ML research, facilitating better information extraction and analysis.
Findings
Fine-tuned models outperform LLM prompting in entity and relation extraction.
The dataset enables effective knowledge graph construction from scientific texts.
LLMs currently underperform compared to supervised models in scholarly IE tasks.
Abstract
Research in Machine Learning (ML) and AI evolves rapidly. Information Extraction (IE) from scientific publications enables to identify information about research concepts and resources on a large scale and therefore is a pathway to improve understanding and reproducibility of ML-related research. To extract and connect fine-grained information in ML-related research, e.g. method training and data usage, we introduce GSAP-ERE. It is a manually curated fine-grained dataset with 10 entity types and 18 semantically categorized relation types, containing mentions of 63K entities and 35K relations from the full text of 100 ML publications. We show that our dataset enables fine-tuned models to automatically extract information relevant for downstream tasks ranging from knowledge graph (KG) construction, to monitoring the computational reproducibility of AI research at scale. Additionally, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Graph Neural Networks · Biomedical Text Mining and Ontologies · Topic Modeling
