Comprehensive Named Entity Recognition on CORD-19 with Distant or Weak Supervision
Xuan Wang, Xiangchen Song, Bangzheng Li, Yingjun Guan, Jiawei Han

TL;DR
This paper introduces CORD-NER, a comprehensive dataset for named entity recognition in COVID-19 research literature, covering 75 entity types with high annotation quality and support for incremental updates.
Contribution
The paper presents a new large-scale NER dataset for COVID-19 literature with extensive entity types, improved annotation quality, and support for incremental updates.
Findings
CORD-NER surpasses SciSpacy in annotation quality by over 10% F1 score.
Supports incremental addition of documents and entity types.
Covers 75 detailed entity types relevant to COVID-19 research.
Abstract
We created this CORD-NER dataset with comprehensive named entity recognition (NER) on the COVID-19 Open Research Dataset Challenge (CORD-19) corpus (2020-03-13). This CORD-NER dataset covers 75 fine-grained entity types: In addition to the common biomedical entity types (e.g., genes, chemicals and diseases), it covers many new entity types related explicitly to the COVID-19 studies (e.g., coronaviruses, viral proteins, evolution, materials, substrates and immune responses), which may benefit research on COVID-19 related virus, spreading mechanisms, and potential vaccines. CORD-NER annotation is a combination of four sources with different NER methods. The quality of CORD-NER annotation surpasses SciSpacy (over 10% higher on the F1 score based on a sample set of documents), a fully supervised BioNER tool. Moreover, CORD-NER supports incrementally adding new documents as well as adding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Biomedical Text Mining and Ontologies
