TBCOV: Two Billion Multilingual COVID-19 Tweets with Sentiment, Entity, Geo, and Gender Labels
Muhammad Imran, Umair Qazi, Ferda Ofli

TL;DR
TBCOV is a comprehensive, multilingual COVID-19 Twitter dataset with over two billion tweets, enriched with sentiment, entity, geo, and gender labels, enabling advanced analysis of public opinion and misinformation during the pandemic.
Contribution
This work introduces TBCOV, a large-scale multilingual COVID-19 Twitter dataset with extensive annotations and a novel geotagging method for detailed spatial analysis.
Findings
Revealed insights into public sentiment and trending topics during COVID-19.
Demonstrated the dataset's utility for misinformation and situational analysis.
Confirmed broad coverage of critical COVID-19 related issues.
Abstract
The widespread usage of social networks during mass convergence events, such as health emergencies and disease outbreaks, provides instant access to citizen-generated data that carry rich information about public opinions, sentiments, urgent needs, and situational reports. Such information can help authorities understand the emergent situation and react accordingly. Moreover, social media plays a vital role in tackling misinformation and disinformation. This work presents TBCOV, a large-scale Twitter dataset comprising more than two billion multilingual tweets related to the COVID-19 pandemic collected worldwide over a continuous period of more than one year. More importantly, several state-of-the-art deep learning models are used to enrich the data with important attributes, including sentiment labels, named-entities (e.g., mentions of persons, organizations, locations), user types,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Sentiment Analysis and Opinion Mining · Data-Driven Disease Surveillance
