Constraint 2021: Machine Learning Models for COVID-19 Fake News Detection Shared Task
Thomas Felber

TL;DR
This paper presents a machine learning approach using linguistic features and classical algorithms to detect COVID-19 fake news on social media, achieving a competitive F1 score in a shared task.
Contribution
The authors develop a system combining linguistic features with classical ML algorithms, notably a linear SVM, for COVID-19 fake news detection in a shared task setting.
Findings
Best system is a linear SVM with 95.19% F1 score
Linguistic features improve classification performance
System ranks 80th out of 167 in the leaderboard
Abstract
In this system paper we present our contribution to the Constraint 2021 COVID-19 Fake News Detection Shared Task, which poses the challenge of classifying COVID-19 related social media posts as either fake or real. In our system, we address this challenge by applying classical machine learning algorithms together with several linguistic features, such as n-grams, readability, emotional tone and punctuation. In terms of pre-processing, we experiment with various steps like stop word removal, stemming/lemmatization, link removal and more. We find our best performing system to be based on a linear SVM, which obtains a weighted average F1 score of 95.19% on test data, which lands a place in the middle of the leaderboard (place 80 of 167).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Topic Modeling · Sentiment Analysis and Opinion Mining
MethodsSupport Vector Machine
