A Pipeline for Post-Crisis Twitter Data Acquisition
Mayank Kejriwal, Yao Gu

TL;DR
This paper introduces a minimally supervised pipeline for rapid collection, filtering, and analysis of Twitter data post-crisis, demonstrated through a case study on the Las Vegas shootings.
Contribution
It presents a novel, minimally supervised pipeline that streamlines immediate post-crisis Twitter data acquisition and relevance filtering without extensive feature engineering.
Findings
Successfully collected and analyzed millions of tweets after the Las Vegas shootings.
The pipeline reduces manual effort through active learning and fast text embeddings.
Demonstrates effectiveness in real-time crisis informatics applications.
Abstract
Due to instant availability of data on social media platforms like Twitter, and advances in machine learning and data management technology, real-time crisis informatics has emerged as a prolific research area in the last decade. Although several benchmarks are now available, especially on portals like CrisisLex, an important, practical problem that has not been addressed thus far is the rapid acquisition and benchmarking of data from free, publicly available streams like the Twitter API. In this paper, we present ongoing work on a pipeline for facilitating immediate post-crisis data collection, curation and relevance filtering from the Twitter API. The pipeline is minimally supervised, alleviating the need for feature engineering by including a judicious mix of data preprocessing and fast text embeddings, along with an active learning framework. We illustrate the utility of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPublic Relations and Crisis Communication · Complex Network Analysis Techniques · Disaster Management and Resilience
