BioRED: A Rich Biomedical Relation Extraction Dataset
Ling Luo, Po-Ting Lai, Chih-Hsuan Wei, Cecilia N Arighi, Zhiyong Lu

TL;DR
BioRED is a comprehensive biomedical relation extraction dataset with multiple entity types and relation pairs at the document level, enabling improved development and benchmarking of RE systems in biomedicine.
Contribution
It introduces BioRED, the first biomedical RE corpus with diverse entity types, relation pairs, and relation annotations for novel and background knowledge, at the document level.
Findings
High NER performance (F1=89.3%) achieved by existing models.
RE performance remains moderate (F1=47.7%) especially for novel relations.
Rich dataset facilitates development of more accurate biomedical RE systems.
Abstract
Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for bio-medical RE only focus on relations of a single type (e.g., protein-protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then we present BioRED, a first-of-its-kind biomedical RE corpus with multiple entity types (e.g., gene/protein, disease, chemical) and relation pairs (e.g., gene-disease; chemical-chemical) at the document level, on a set of 600 PubMed abstracts. Further, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
