Multilingual Models for Check-Worthy Social Media Posts Detection

Sebastian Kula; Michal Gregor

arXiv:2408.06737·cs.CL·August 14, 2024

Multilingual Models for Check-Worthy Social Media Posts Detection

Sebastian Kula, Michal Gregor

PDF

Open Access

TL;DR

This paper develops and evaluates multilingual transformer models for detecting verifiable factual and harmful claims in social media posts across multiple languages, demonstrating robustness and efficiency.

Contribution

It introduces novel multi-label multilingual classification models capable of detecting both harmful and factual claims simultaneously in social media posts.

Findings

01

Models outperform state-of-the-art benchmarks

02

Effective in low-resource languages

03

Robust detection across multiple languages

Abstract

This work presents an extensive study of transformer-based NLP models for detection of social media posts that contain verifiable factual claims and harmful claims. The study covers various activities, including dataset collection, dataset pre-processing, architecture selection, setup of settings, model training (fine-tuning), model testing, and implementation. The study includes a comprehensive analysis of different models, with a special focus on multilingual models where the same model is capable of processing social media posts in both English and in low-resource languages such as Arabic, Bulgarian, Dutch, Polish, Czech, Slovak. The results obtained from the study were validated against state-of-the-art models, and the comparison demonstrated the robustness of the proposed models. The novelty of this work lies in the development of multi-label multilingual classification models that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpam and Phishing Detection · Misinformation and Its Impacts · Hate Speech and Cyberbullying Detection

MethodsFocus