BillionCOV: An Enriched Billion-scale Collection of COVID-19 tweets for   Efficient Hydration

Rabindra Lamsal; Maria Rodriguez Read; Shanika Karunasekera

arXiv:2301.11284·cs.SI·September 13, 2023

BillionCOV: An Enriched Billion-scale Collection of COVID-19 tweets for Efficient Hydration

Rabindra Lamsal, Maria Rodriguez Read, Shanika Karunasekera

PDF

Open Access

TL;DR

BillionCOV is a comprehensive, large-scale COVID-19 tweet dataset with 1.4 billion tweets from around the world, designed to enable efficient data hydration and address issues of redundancy and data loss in prior datasets.

Contribution

This paper introduces BillionCOV, a large-scale COVID-19 tweet dataset that improves data quality and hydration efficiency compared to existing datasets.

Findings

01

Contains 1.4 billion tweets from 240 countries

02

Addresses redundancy and deleted/protected tweets issues

03

Facilitates efficient tweet hydration for researchers

Abstract

The COVID-19 pandemic introduced new norms such as social distancing, face masks, quarantine, lockdowns, travel restrictions, work/study from home, and business closures, to name a few. The pandemic's seriousness made people vocal on social media, especially on microblogs such as Twitter. Researchers have been collecting and sharing large-scale datasets of COVID-19 tweets since the early days of the outbreak. Sharing raw Twitter data with third parties is restricted; users need to hydrate tweet identifiers in a public dataset to re-create the dataset locally. Large-scale datasets that include original tweets, retweets, quotes, and replies have tweets in billions which takes months to hydrate. The existing datasets carry issues related to proportion and redundancy. We report that more than 500 million tweet identifiers point to deleted or protected tweets. In order to address these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts