TL;DR
GeoCoV19 is a massive, multilingual Twitter dataset with geolocation data, designed to aid research on COVID-19 information dissemination, societal response, and disease surveillance during the pandemic.
Contribution
This paper introduces GeoCoV19, a large-scale, geolocated, multilingual COVID-19 Twitter dataset, enabling advanced social media analysis for pandemic response and research.
Findings
Over 524 million tweets collected over 90 days.
A gazetteer-based method for inferring tweet geolocation.
Dataset supports research on misinformation, community response, and disease modeling.
Abstract
The past several years have witnessed a huge surge in the use of social media platforms during mass convergence events such as health emergencies, natural or human-induced disasters. These non-traditional data sources are becoming vital for disease forecasts and surveillance when preparing for epidemic and pandemic outbreaks. In this paper, we present GeoCoV19, a large-scale Twitter dataset containing more than 524 million multilingual tweets posted over a period of 90 days since February 1, 2020. Moreover, we employ a gazetteer-based approach to infer the geolocation of tweets. We postulate that this large-scale, multilingual, geolocated social media data can empower the research communities to evaluate how societies are collectively coping with this unprecedented global crisis as well as to develop computational methods to address challenges such as identifying fake news,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
