Looking for COVID-19 misinformation in multilingual social media texts
Raj Ratn Pranesh, Mehrdad Farokhnejad, Ambesh Shekhar and, Genoveva Vargas-Solar

TL;DR
This paper introduces CMTA, a multilingual data science pipeline utilizing machine learning models to detect and analyze COVID-19 misinformation across various languages in social media texts, outperforming monolingual models.
Contribution
The paper presents a novel multilingual misinformation detection pipeline that combines Dense-CNN and MBERT, demonstrating superior performance over monolingual models in COVID-19 social media texts.
Findings
CMTA outperforms monolingual models in misinformation detection.
Misinformation about COVID-19 spread across multiple languages.
Identified COVID-19 misinformation trends during early pandemic months.
Abstract
This paper presents the Multilingual COVID-19 Analysis Method (CMTA) for detecting and observing the spread of misinformation about this disease within texts. CMTA proposes a data science (DS) pipeline that applies machine learning models for processing, classifying (Dense-CNN) and analyzing (MBERT) multilingual (micro)-texts. DS pipeline data preparation tasks extract features from multilingual textual data and categorize it into specific information classes (i.e., 'false', 'partly false', 'misleading'). The CMTA pipeline has been experimented with multilingual micro-texts (tweets), showing misinformation spread across different languages. To assess the performance of CMTA and put it in perspective, we performed a comparative analysis of CMTA with eight monolingual models used for detecting misinformation. The comparison shows that CMTA has surpassed various monolingual models and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
