BioRED: A Rich Biomedical Relation Extraction Dataset

Ling Luo; Po-Ting Lai; Chih-Hsuan Wei; Cecilia N Arighi; Zhiyong Lu

arXiv:2204.04263·cs.CL·July 20, 2022

BioRED: A Rich Biomedical Relation Extraction Dataset

Ling Luo, Po-Ting Lai, Chih-Hsuan Wei, Cecilia N Arighi, Zhiyong Lu

PDF

1 Repo 1 Datasets

TL;DR

BioRED is a comprehensive biomedical relation extraction dataset with multiple entity types and relation pairs at the document level, enabling improved development and benchmarking of RE systems in biomedicine.

Contribution

It introduces BioRED, the first biomedical RE corpus with diverse entity types, relation pairs, and relation annotations for novel and background knowledge, at the document level.

Findings

01

High NER performance (F1=89.3%) achieved by existing models.

02

RE performance remains moderate (F1=47.7%) especially for novel relations.

03

Rich dataset facilitates development of more accurate biomedical RE systems.

Abstract

Automated relation extraction (RE) from biomedical literature is critical for many downstream text mining applications in both research and real-world settings. However, most existing benchmarking datasets for bio-medical RE only focus on relations of a single type (e.g., protein-protein interactions) at the sentence level, greatly limiting the development of RE systems in biomedicine. In this work, we first review commonly used named entity recognition (NER) and RE datasets. Then we present BioRED, a first-of-its-kind biomedical RE corpus with multiple entity types (e.g., gene/protein, disease, chemical) and relation pairs (e.g., gene-disease; chemical-chemical) at the document level, on a set of 600 PubMed abstracts. Further, we label each relation as describing either a novel finding or previously known background knowledge, enabling automated algorithms to differentiate between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ncbi/BioRED
none

Datasets

bigbio/biored
dataset· 156 dl
156 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.