IRLCov19: A Large COVID-19 Multilingual Twitter Dataset of Indian Regional Languages
Deepak Uniyal, Amit Agarwal

TL;DR
This paper introduces IRLCov19, a large multilingual Twitter dataset from India focused on COVID-19, highlighting language usage patterns and providing a resource for researchers and policymakers.
Contribution
It presents a comprehensive COVID-19 Twitter dataset in Indian regional languages, analyzing language distribution and usage during the pandemic.
Findings
English accounts for over 64% of tweets.
Twelve regional languages make up approximately 4.77% of tweets.
The dataset covers tweets from February to July 2020.
Abstract
Emerged in Wuhan city of China in December 2019, COVID-19 continues to spread rapidly across the world despite authorities having made available a number of vaccines. While the coronavirus has been around for a significant period of time, people and authorities still feel the need for awareness due to the mutating nature of the virus and therefore varying symptoms and prevention strategies. People and authorities resort to social media platforms the most to share awareness information and voice out their opinions due to their massive outreach in spreading the word in practically no time. People use a number of languages to communicate over social media platforms based on their familiarity, language outreach, and availability on social media platforms. The entire world has been hit by the coronavirus and India is the second worst-hit country in terms of the number of active coronavirus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Misinformation and Its Impacts · COVID-19 diagnosis using AI
