ArCOV-19: The First Arabic COVID-19 Twitter Dataset with Propagation Networks
Fatima Haouari, Maram Hasanain, Reem Suwaileh, Tamer Elsayed

TL;DR
This paper introduces ArCOV-19, a comprehensive Arabic COVID-19 Twitter dataset with propagation networks, enabling diverse research in NLP, social computing, and information retrieval, and includes tools for dataset curation.
Contribution
First publicly available Arabic COVID-19 Twitter dataset with propagation networks, covering one year and supporting multiple research domains.
Findings
Captures rising COVID-19 discussions in the Arab world
Includes propagation networks with retweets and replies
Provides tools for dataset curation
Abstract
In this paper, we present ArCOV-19, an Arabic COVID-19 Twitter dataset that spans one year, covering the period from 27th of January 2020 till 31st of January 2021. ArCOV-19 is the first publicly-available Arabic Twitter dataset covering COVID-19 pandemic that includes about 2.7M tweets alongside the propagation networks of the most-popular subset of them (i.e., most-retweeted and -liked). The propagation networks include both retweets and conversational threads (i.e., threads of replies). ArCOV-19 is designed to enable research under several domains including natural language processing, information retrieval, and social computing. Preliminary analysis shows that ArCOV-19 captures rising discussions associated with the first reported cases of the disease as they appeared in the Arab world. In addition to the source tweets and propagation networks, we also release the search queries and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Data-Driven Disease Surveillance · Sentiment Analysis and Opinion Mining
