A Domain-Specific Curated Benchmark for Entity and Document-Level Relation Extraction
Marco Martinelli, Stefano Marchesin, Vanessa Bonato, Giorgio Maria Di Nunzio, Nicola Ferro, Ornella Irrera, Laura Menotti, Federica Vezzani, Gianmaria Silvello

TL;DR
GutBrainIE is a comprehensive, manually curated benchmark derived from PubMed abstracts, designed to advance biomedical information extraction by providing detailed annotations and supporting multiple tasks across domains.
Contribution
The paper introduces GutBrainIE, a new biomedical IE benchmark with expert annotations, rich schema, and multi-task capabilities, addressing limitations of existing datasets.
Findings
Provides over 1,600 expertly annotated PubMed abstracts
Supports multiple IE tasks with rich, fine-grained annotations
Combines curated and weakly supervised data for broad applicability
Abstract
Information Extraction (IE), encompassing Named Entity Recognition (NER), Named Entity Linking (NEL), and Relation Extraction (RE), is critical for transforming the rapidly growing volume of scientific publications into structured, actionable knowledge. This need is especially evident in fast-evolving biomedical fields such as the gut-brain axis, where research investigates complex interactions between the gut microbiota and brain-related disorders. Existing biomedical IE benchmarks, however, are often narrow in scope and rely heavily on distantly supervised or automatically generated annotations, limiting their utility for advancing robust IE methods. We introduce GutBrainIE, a benchmark based on more than 1,600 PubMed abstracts, manually annotated by biomedical and terminological experts with fine-grained entities, concept-level links, and relations. While grounded in the gut-brain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Biomedical Text Mining and Ontologies · Advanced Text Analysis Techniques
