Active Learning for Identifying Disaster-Related Tweets: A Comparison with Keyword Filtering and Generic Fine-Tuning
David Hanny, Sebastian Schmidt, Bernd Resch

TL;DR
This paper explores the use of Active Learning with RoBERTa models to efficiently identify disaster-related tweets, outperforming keyword filtering and generic fine-tuning, with minimal labeling effort needed.
Contribution
It demonstrates that combining Active Learning with fine-tuned RoBERTa models significantly improves disaster tweet classification performance with less labeled data.
Findings
Active Learning with RoBERTa outperforms keyword filtering.
Few rounds of AL achieve high classification accuracy.
Broadly applicable disaster tweet classifier developed with minimal labeling.
Abstract
Information from social media can provide essential information for emergency response during natural disasters in near real-time. However, it is difficult to identify the disaster-related posts among the large amounts of unstructured data available. Previous methods often use keyword filtering, topic modelling or classification-based techniques to identify such posts. Active Learning (AL) presents a promising sub-field of Machine Learning (ML) that has not been used much in the field of text classification of social media content. This study therefore investigates the potential of AL for identifying disaster-related Tweets. We compare a keyword filtering approach, a RoBERTa model fine-tuned with generic data from CrisisLex, a base RoBERTa model trained with AL and a fine-tuned RoBERTa model trained with AL regarding classification performance. For testing, data from CrisisLex and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Softmax · Linear Layer · Attention Dropout · Dropout · WordPiece · Residual Connection · Layer Normalization · Multi-Head Attention
