Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages
Muhammad Imran, Prasenjit Mitra, Carlos Castillo

TL;DR
This paper introduces human-annotated Twitter datasets from 19 crises, along with word embeddings and lexical resources, to improve NLP tasks like classification during emergencies.
Contribution
It provides the largest crisis-related Twitter corpora, new lexical resources, and trained word embeddings to advance NLP applications in disaster response.
Findings
Effective classifiers trained on the annotated data.
Largest crisis-related Twitter word embeddings created.
Normalized lexical resources for noisy social media language.
Abstract
Microblogging platforms such as Twitter provide active communication channels during mass convergence and emergency events such as earthquakes, typhoons. During the sudden onset of a crisis situation, affected people post useful information on Twitter that can be used for situational awareness and other humanitarian disaster response efforts, if processed timely and effectively. Processing social media information pose multiple challenges such as parsing noisy, brief and informal messages, learning information categories from the incoming stream of messages and classifying them into different classes among others. One of the basic necessities of many of these tasks is the availability of data, in particular human-annotated data. In this paper, we present human-annotated Twitter corpora collected during 19 different crises that took place between 2013 and 2015. To demonstrate the utility…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPublic Relations and Crisis Communication · Sentiment Analysis and Opinion Mining · Disaster Management and Resilience
