Characterizing drug mentions in COVID-19 Twitter Chatter

Ramya Tekumalla; Juan M. Banda

arXiv:2007.10276·cs.IR·October 12, 2020

Characterizing drug mentions in COVID-19 Twitter Chatter

Ramya Tekumalla, Juan M. Banda

PDF

TL;DR

This study analyzes Twitter chatter about COVID-19 drugs, highlighting the importance of machine learning and preprocessing to accurately identify drug mentions amid informal language and misspellings.

Contribution

The paper introduces a combined machine learning and automated approach to improve drug mention detection in social media data, addressing challenges of informal language and misspellings.

Findings

01

Recovered 15% more drug mentions with the proposed methods.

02

Demonstrated the necessity of preprocessing for social media text analysis.

03

Showed machine learning complements traditional methods effectively.

Abstract

Since the classification of COVID-19 as a global pandemic, there have been many attempts to treat and contain the virus. Although there is no specific antiviral treatment recommended for COVID-19, there are several drugs that can potentially help with symptoms. In this work, we mined a large twitter dataset of 424 million tweets of COVID-19 chatter to identify discourse around drug mentions. While seemingly a straightforward task, due to the informal nature of language use in Twitter, we demonstrate the need of machine learning alongside traditional automated methods to aid in this task. By applying these complementary methods, we are able to recover almost 15% additional data, making misspelling handling a needed task as a pre-processing step when dealing with social media data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.