Multilingual Models for Check-Worthy Social Media Posts Detection
Sebastian Kula, Michal Gregor

TL;DR
This paper develops and evaluates multilingual transformer models for detecting verifiable factual and harmful claims in social media posts across multiple languages, demonstrating robustness and efficiency.
Contribution
It introduces novel multi-label multilingual classification models capable of detecting both harmful and factual claims simultaneously in social media posts.
Findings
Models outperform state-of-the-art benchmarks
Effective in low-resource languages
Robust detection across multiple languages
Abstract
This work presents an extensive study of transformer-based NLP models for detection of social media posts that contain verifiable factual claims and harmful claims. The study covers various activities, including dataset collection, dataset pre-processing, architecture selection, setup of settings, model training (fine-tuning), model testing, and implementation. The study includes a comprehensive analysis of different models, with a special focus on multilingual models where the same model is capable of processing social media posts in both English and in low-resource languages such as Arabic, Bulgarian, Dutch, Polish, Czech, Slovak. The results obtained from the study were validated against state-of-the-art models, and the comparison demonstrated the robustness of the proposed models. The novelty of this work lies in the development of multi-label multilingual classification models that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · Misinformation and Its Impacts · Hate Speech and Cyberbullying Detection
MethodsFocus
