MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts

Sunil Mohan; Donghui Li

arXiv:1902.09476·cs.CL·February 26, 2019·47 cites

MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts

Sunil Mohan, Donghui Li

PDF

Open Access 1 Repo 2 Models 5 Datasets

TL;DR

MedMentions is a large, manually annotated biomedical corpus linked with UMLS concepts, designed to advance biomedical named entity recognition and linking research with extensive data and baseline models.

Contribution

This paper introduces MedMentions, a comprehensive biomedical corpus with over 4,000 abstracts and 350,000 mentions linked to UMLS, including data splits and baseline models for entity linking.

Findings

01

Over 4,000 abstracts annotated with 350,000 mentions

02

Includes a subset focused on document retrieval

03

Provides baseline models and evaluation metrics

Abstract

This paper presents the formal release of MedMentions, a new manually annotated resource for the recognition of biomedical concepts. What distinguishes MedMentions from other annotated biomedical corpora is its size (over 4,000 abstracts and over 350,000 linked mentions), as well as the size of the concept ontology (over 3 million concepts from UMLS 2017) and its broad coverage of biomedical disciplines. In addition to the full corpus, a sub-corpus of MedMentions is also presented, comprising annotations for a subset of UMLS 2017 targeted towards document retrieval. To encourage research in Biomedical Named Entity Recognition and Linking, data splits for training and testing are included in the release, and a baseline model and its metrics for entity linking are also described.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chanzuckerberg/MedMentions
noneOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Natural Language Processing Techniques