MM-COVID: A Multilingual and Multimodal Data Repository for Combating COVID-19 Disinformation
Yichuan Li, Bohan Jiang, Kai Shu, Huan Liu

TL;DR
This paper introduces MM-COVID, a multilingual and multimodal dataset of COVID-19 fake news and trustworthy information across six languages, aimed at improving fake news detection and analysis.
Contribution
The creation of a comprehensive multilingual fake news dataset with social context for COVID-19, facilitating research in multilingual fake news detection and social media analysis.
Findings
The dataset includes 3981 fake news and 7192 trustworthy items across six languages.
Detailed analysis of the dataset reveals linguistic and social patterns in COVID-19 misinformation.
Demonstrates the dataset's utility in various fake news detection applications.
Abstract
The COVID-19 epidemic is considered as the global health crisis of the whole society and the greatest challenge mankind faced since World War Two. Unfortunately, the fake news about COVID-19 is spreading as fast as the virus itself. The incorrect health measurements, anxiety, and hate speeches will have bad consequences on people's physical health, as well as their mental health in the whole world. To help better combat the COVID-19 fake news, we propose a new fake news detection dataset MM-COVID(Multilingual and Multidimensional COVID-19 Fake News Data Repository). This dataset provides the multilingual fake news and the relevant social context. We collect 3981 pieces of fake news content and 7192 trustworthy information from English, Spanish, Portuguese, Hindi, French and Italian, 6 different languages. We present a detailed and exploratory analysis of MM-COVID from different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Hate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining
