Multilingual vs Crosslingual Retrieval of Fact-Checked Claims: A Tale of Two Approaches
Alan Ramponi, Marco Rovera, Robert Moro, Sara Tonelli

TL;DR
This paper compares multilingual and crosslingual retrieval methods for fact-checked claims, highlighting the effectiveness of LLM-based re-ranking and negative sampling strategies across 47 languages.
Contribution
It introduces strategies to enhance crosslingual claim retrieval and demonstrates that crosslingual and multilingual setups have distinct characteristics.
Findings
LLM-based re-ranking yields the best retrieval performance.
Negative example sampling improves supervised retrieval.
Crosslingual retrieval has unique challenges compared to multilingual retrieval.
Abstract
Retrieval of previously fact-checked claims is a well-established task, whose automation can assist professional fact-checkers in the initial steps of information verification. Previous works have mostly tackled the task monolingually, i.e., having both the input and the retrieved claims in the same language. However, especially for languages with a limited availability of fact-checks and in case of global narratives, such as pandemics, wars, or international politics, it is crucial to be able to retrieve claims across languages. In this work, we examine strategies to improve the multilingual and crosslingual performance, namely selection of negative examples (in the supervised) and re-ranking (in the unsupervised setting). We evaluate all approaches on a dataset containing posts and claims in 47 languages (283 language combinations). We observe that the best results are obtained by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsArtificial Intelligence in Law · linguistics and terminology studies · Multi-Agent Systems and Negotiation
