Matching Tweets With Applicable Fact-Checks Across Languages
Ashkan Kazemi, Zehua Li, Ver\'onica P\'erez-Rosas, Scott A. Hale, Rada, Mihalcea

TL;DR
This paper develops methods to automatically match social media claims with relevant fact-checks across multiple languages, using transformer models and retrieval techniques, and introduces a new dataset for this task.
Contribution
It presents a comprehensive study of multilingual and cross-lingual fact-check matching, compares classification and retrieval approaches, and introduces a new dataset for future research.
Findings
Match classification achieves 86% accuracy across language pairs.
BM25 baseline performs competitively with advanced embedding models.
Multilingual models face challenges in cross-lingual fact-check retrieval.
Abstract
An important challenge for news fact-checking is the effective dissemination of existing fact-checks. This in turn brings the need for reliable methods to detect previously fact-checked claims. In this paper, we focus on automatically finding existing fact-checks for claims made in social media posts (tweets). We conduct both classification and retrieval experiments, in monolingual (English only), multilingual (Spanish, Portuguese), and cross-lingual (Hindi-English) settings using multilingual transformer models such as XLM-RoBERTa and multilingual embeddings such as LaBSE and SBERT. We present promising results for "match" classification (86% average accuracy) in four language pairs. We also find that a BM25 baseline outperforms or is on par with state-of-the-art multilingual embedding models for the retrieval task during our monolingual experiments. We highlight and discuss NLP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Topic Modeling · Hate Speech and Cyberbullying Detection
MethodsSentence-BERT
