TweetDIS: A Large Twitter Dataset for Natural Disasters Built using Weak   Supervision

Ramya Tekumalla; Juan M. Banda

arXiv:2207.04947·cs.CL·July 12, 2022

TweetDIS: A Large Twitter Dataset for Natural Disasters Built using Weak Supervision

Ramya Tekumalla, Juan M. Banda

PDF

TL;DR

This paper introduces TweetDIS, a large Twitter dataset for natural disaster detection created using weak supervision, enabling scalable and real-time disaster tweet classification with high accuracy.

Contribution

The authors present a novel large-scale weakly supervised dataset for natural disaster detection on Twitter, reducing reliance on labor-intensive human annotation.

Findings

01

Models trained on the dataset achieved over 90% accuracy.

02

The dataset enables effective classification of earthquakes, hurricanes, and floods.

03

The dataset is publicly released for research use.

Abstract

Social media is often utilized as a lifeline for communication during natural disasters. Traditionally, natural disaster tweets are filtered from the Twitter stream using the name of the natural disaster and the filtered tweets are sent for human annotation. The process of human annotation to create labeled sets for machine learning models is laborious, time consuming, at times inaccurate, and more importantly not scalable in terms of size and real-time use. In this work, we curate a silver standard dataset using weak supervision. In order to validate its utility, we train machine learning models on the weakly supervised data to identify three different types of natural disasters i.e earthquakes, hurricanes and floods. Our results demonstrate that models trained on the silver standard dataset achieved performance greater than 90% when classifying a manually curated, gold-standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.