NAIST COVID: Multilingual COVID-19 Twitter and Weibo Dataset

Zhiwei Gao; Shuntaro Yada; Shoko Wakamiya; Eiji Aramaki

arXiv:2004.08145·cs.SI·April 20, 2020·24 cites

NAIST COVID: Multilingual COVID-19 Twitter and Weibo Dataset

Zhiwei Gao, Shuntaro Yada, Shoko Wakamiya, Eiji Aramaki

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multilingual COVID-19 social media dataset from Twitter and Weibo, covering early pandemic posts in English, Japanese, and Chinese, enabling diverse social media analysis for pandemic-related insights.

Contribution

It provides a publicly available multilingual social media dataset related to COVID-19, including analysis tools like daily word clouds for text-mining research.

Findings

01

Dataset covers January to March 2020

02

Includes multilingual posts in English, Japanese, and Chinese

03

Demonstrates potential for text-mining and social media analysis

Abstract

Since the outbreak of coronavirus disease 2019 (COVID-19) in the late 2019, it has affected over 200 countries and billions of people worldwide. This has affected the social life of people owing to enforcements, such as "social distancing" and "stay at home." This has resulted in an increasing interaction through social media. Given that social media can bring us valuable information about COVID-19 at a global scale, it is important to share the data and encourage social media studies against COVID-19 or other infectious diseases. Therefore, we have released a multilingual dataset of social media posts related to COVID-19, consisting of microblogs in English and Japanese from Twitter and those in Chinese from Weibo. The data cover microblogs from January 20, 2020, to March 24, 2020. This paper also provides a quantitative as well as qualitative analysis of these datasets by creating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sociocom/covid19_dataset
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Data-Driven Disease Surveillance · Sentiment Analysis and Opinion Mining