A Comprehensive Dictionary and Term Variation Analysis for COVID-19 and SARS-CoV-2
Robert Leaman, Zhiyong Lu

TL;DR
This paper presents an extensive, rule-based dictionary of terms for SARS-CoV-2 and COVID-19, enabling better identification of these entities in literature despite high term variation.
Contribution
The authors developed a comprehensive, iteratively generated dictionary of COVID-19 and SARS-CoV-2 terms, surpassing existing resources in coverage and facilitating analysis of term usage over time.
Findings
Dictionary contains more terms than existing resources
Term usage continues to grow rapidly over time
Dictionary is freely available for research use
Abstract
The number of unique terms in the scientific literature used to refer to either SARS-CoV-2 or COVID-19 is remarkably large and has continued to increase rapidly despite well-established standardized terms. This high degree of term variation makes high recall identification of these important entities difficult. In this manuscript we present an extensive dictionary of terms used in the literature to refer to SARS-CoV-2 and COVID-19. We use a rule-based approach to iteratively generate new term variants, then locate these variants in a large text corpus. We compare our dictionary to an extensive collection of terminological resources, demonstrating that our resource provides a substantial number of additional terms. We use our dictionary to analyze the usage of SARS-CoV-2 and COVID-19 terms over time and show that the number of unique terms continues to grow rapidly. Our dictionary is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · linguistics and terminology studies · Natural Language Processing Techniques
