Matching Tweets With Applicable Fact-Checks Across Languages

Ashkan Kazemi; Zehua Li; Ver\'onica P\'erez-Rosas; Scott A. Hale; Rada; Mihalcea

arXiv:2202.07094·cs.CL·June 14, 2022·5 cites

Matching Tweets With Applicable Fact-Checks Across Languages

Ashkan Kazemi, Zehua Li, Ver\'onica P\'erez-Rosas, Scott A. Hale, Rada, Mihalcea

PDF

Open Access

TL;DR

This paper develops methods to automatically match social media claims with relevant fact-checks across multiple languages, using transformer models and retrieval techniques, and introduces a new dataset for this task.

Contribution

It presents a comprehensive study of multilingual and cross-lingual fact-check matching, compares classification and retrieval approaches, and introduces a new dataset for future research.

Findings

01

Match classification achieves 86% accuracy across language pairs.

02

BM25 baseline performs competitively with advanced embedding models.

03

Multilingual models face challenges in cross-lingual fact-check retrieval.

Abstract

An important challenge for news fact-checking is the effective dissemination of existing fact-checks. This in turn brings the need for reliable methods to detect previously fact-checked claims. In this paper, we focus on automatically finding existing fact-checks for claims made in social media posts (tweets). We conduct both classification and retrieval experiments, in monolingual (English only), multilingual (Spanish, Portuguese), and cross-lingual (Hindi-English) settings using multilingual transformer models such as XLM-RoBERTa and multilingual embeddings such as LaBSE and SBERT. We present promising results for "match" classification (86% average accuracy) in four language pairs. We also find that a BM25 baseline outperforms or is on par with state-of-the-art multilingual embedding models for the retrieval task during our monolingual experiments. We highlight and discuss NLP…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Topic Modeling · Hate Speech and Cyberbullying Detection

MethodsSentence-BERT