Unsupervised Text Mining of COVID-19 Records
Mohamad Zamini

TL;DR
This paper preprocesses and annotates COVID-19 related social media data to facilitate automated text mining, aiding research and social interventions during the pandemic.
Contribution
It provides a publicly available, preprocessed COVID-19 dataset for supervised classification, enhancing research tools for pandemic-related social media analysis.
Findings
Created a preprocessed COVID-19 dataset from CORD-19
Annotated data for supervised classification tasks
Made dataset publicly available on Github
Abstract
Since the beginning of coronavirus, the disease has spread worldwide and drastically changed many aspects of the human's lifestyle. Twitter as a powerful tool can help researchers measure public health in response to COVID-19. According to the high volume of data production on social networks, automated text mining approaches can help search, read and summarize helpful information. This paper preprocessed the existing medical dataset regarding COVID-19 named CORD-19 and annotated the dataset for supervised classification tasks. At this time of the COVID-19 pandemic, we made a preprocessed dataset for the research community. This may contribute towards finding new solutions for some social interventions that COVID-19 has made. The preprocessed version of the mentioned dataset is publicly available through Github.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Data-Driven Disease Surveillance · Sentiment Analysis and Opinion Mining
