Active Keyword Selection to Track Evolving Topics on Twitter
Sacha L\'evy, Farimah Poursafaei, Kellin Pelrine, Reihaneh Rabbany

TL;DR
This paper introduces an active learning approach to optimize keyword selection for tracking evolving Twitter topics, significantly improving data relevance and quantity for social media research.
Contribution
It presents a novel active learning method for dynamic keyword refinement, enhancing Twitter data collection for evolving topics like COVID-19 sub-topics.
Findings
Keyword recall doubled compared to baselines
Method effectively tracks COVID-19 sub-topics
Open-source tools facilitate systematic data collection
Abstract
How can we study social interactions on evolving topics at a mass scale? Over the past decade, researchers from diverse fields such as economics, political science, and public health have often done this by querying Twitter's public API endpoints with hand-picked topical keywords to search or stream discussions. However, despite the API's accessibility, it remains difficult to select and update keywords to collect high-quality data relevant to topics of interest. In this paper, we propose an active learning method for rapidly refining query keywords to increase both the yielded topic relevance and dataset size. We leverage a large open-source COVID-19 Twitter dataset to illustrate the applicability of our method in tracking Tweets around the key sub-topics of Vaccine, Mask, and Lockdown. Our experiments show that our method achieves an average topic-related keyword recall 2x higher than…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComplex Network Analysis Techniques · Advanced Text Analysis Techniques · Misinformation and Its Impacts
