Coronavirus research topics, tracking twenty years of research
Amir Aryani, Jingbo Wang, Luis Salvador-Carulla, Jihoon Woo, Cathy P. W. Cheung, Zhuochen Wu, Hui Yin, Junhua Xiao, Elisabeth A. Lambert, Jason Howitt, Jean M. Davidson, Serene Yoong, John B. Dixon, Rachel E. Climie, Jose A. Salinas-Perez, Nasser Bagheri, Celine Santiago

TL;DR
This paper presents a dataset of over 800,000 coronavirus research articles from 2002 to 2024, organized into thematic clusters to help understand research trends and innovations.
Contribution
The novel contribution is a systematically curated and expert-reviewed dataset of coronavirus research, organized thematically for accessibility and reuse.
Findings
A dataset of over 800,000 coronavirus-related articles was created using natural language processing.
Research trends were organized into thematic clusters like vaccine development and public health strategies.
The dataset was reviewed and revised with input from health experts to ensure accuracy and relevance.
Abstract
Research publications aimed at understanding the various aspects of Coronaviruses, particularly COVID-19, have significantly shaped our knowledge base. While the urgency to monitor COVID-19 in real-time has decreased, the continual influx of new research of monthly articles underscores the importance of systematic review and analysis to deepen our understanding of the pandemic’s broad impact. To explore research trends and innovations in this space, we developed a pipeline using natural language processing techniques. This pipeline systematically catalogues and synthesises the vast array of research articles, leading to the creation of a dataset with more than eight hundred thousand articles from July 2002 to May 2024. This paper describes the content of this dataset and provides the necessary information to make this dataset accessible and reusable for future research. Our approach…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 10
Figure 11
Figure 12
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 epidemiological studies · COVID-19 Clinical Research Studies · Long-Term Effects of COVID-19
