Exploratory Analysis of Covid-19 Tweets using Topic Modeling, UMAP, and DiGraphs
Catherine Ordun, Sanjay Purushotham, Edward Raff

TL;DR
This paper employs multiple machine learning and network analysis techniques to explore Covid-19 related tweets, revealing insights into topic themes, information spread speed, and user engagement patterns.
Contribution
It introduces novel machine learning methods, including UMAP, for analyzing Covid-19 Twitter data, enhancing understanding of topic clustering and information dissemination.
Findings
U.S. Covid-19 topics spike after government briefings
Median retweet time was 2.87 hours in March 2020
Distinct user groups dominate Covid-19 retweet cascades
Abstract
This paper illustrates five different techniques to assess the distinctiveness of topics, key terms and features, speed of information dissemination, and network behaviors for Covid19 tweets. First, we use pattern matching and second, topic modeling through Latent Dirichlet Allocation (LDA) to generate twenty different topics that discuss case spread, healthcare workers, and personal protective equipment (PPE). One topic specific to U.S. cases would start to uptick immediately after live White House Coronavirus Task Force briefings, implying that many Twitter users are paying attention to government announcements. We contribute machine learning methods not previously reported in the Covid19 Twitter literature. This includes our third method, Uniform Manifold Approximation and Projection (UMAP), that identifies unique clustering-behavior of distinct topics to improve our understanding of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Computational and Text Analysis Methods · Sentiment Analysis and Opinion Mining
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
