Unsupervised Text Mining of COVID-19 Records

Mohamad Zamini

arXiv:2110.07357·cs.CL·October 15, 2021

Unsupervised Text Mining of COVID-19 Records

Mohamad Zamini

PDF

Open Access

TL;DR

This paper preprocesses and annotates COVID-19 related social media data to facilitate automated text mining, aiding research and social interventions during the pandemic.

Contribution

It provides a publicly available, preprocessed COVID-19 dataset for supervised classification, enhancing research tools for pandemic-related social media analysis.

Findings

01

Created a preprocessed COVID-19 dataset from CORD-19

02

Annotated data for supervised classification tasks

03

Made dataset publicly available on Github

Abstract

Since the beginning of coronavirus, the disease has spread worldwide and drastically changed many aspects of the human's lifestyle. Twitter as a powerful tool can help researchers measure public health in response to COVID-19. According to the high volume of data production on social networks, automated text mining approaches can help search, read and summarize helpful information. This paper preprocessed the existing medical dataset regarding COVID-19 named CORD-19 and annotated the dataset for supervised classification tasks. At this time of the COVID-19 pandemic, we made a preprocessed dataset for the research community. This may contribute towards finding new solutions for some social interventions that COVID-19 has made. The preprocessed version of the mentioned dataset is publicly available through Github.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Data-Driven Disease Surveillance · Sentiment Analysis and Opinion Mining