NEREL-BIO: A Dataset of Biomedical Abstracts Annotated with Nested Named Entities
Natalia Loukachevitch, Suresh Manandhar, Elina Baral, Igor Rozhkov,, Pavel Braslavski, Vladimir Ivanov, Tatiana Batura, and Elena Tutubalina

TL;DR
NEREL-BIO is a new annotated corpus of Russian and English biomedical abstracts with nested named entities, designed for domain transfer and cross-language experiments, and includes benchmark results with transformer and MRC models.
Contribution
It introduces a domain-specific, nested entity annotation scheme for biomedical abstracts in Russian and English, expanding the NEREL dataset for transfer learning tasks.
Findings
Transformer-based models achieve high accuracy on the dataset.
MRC models effectively detect nested entities.
The dataset enables cross-domain and cross-language transfer experiments.
Abstract
This paper describes NEREL-BIO -- an annotation scheme and corpus of PubMed abstracts in Russian and smaller number of abstracts in English. NEREL-BIO extends the general domain dataset NEREL by introducing domain-specific entity types. NEREL-BIO annotation scheme covers both general and biomedical domains making it suitable for domain transfer experiments. NEREL-BIO provides annotation for nested named entities as an extension of the scheme employed for NEREL. Nested named entities may cross entity boundaries to connect to shorter entities nested within longer entities, making them harder to detect. NEREL-BIO contains annotations for 700+ Russian and 100+ English abstracts. All English PubMed annotations have corresponding Russian counterparts. Thus, NEREL-BIO comprises the following specific features: annotation of nested named entities, it can be used as a benchmark for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies
