Claim Matching Beyond English to Scale Global Fact-Checking

Ashkan Kazemi; Kiran Garimella; Devin Gaffney; Scott A. Hale

arXiv:2106.00853·cs.CL·June 3, 2021

Claim Matching Beyond English to Scale Global Fact-Checking

Ashkan Kazemi, Kiran Garimella, Devin Gaffney, Scott A. Hale

PDF

TL;DR

This paper introduces a multilingual claim matching approach to scale fact-checking across languages, using a novel dataset and a custom embedding model that outperforms existing multilingual models.

Contribution

The paper presents a new multilingual dataset for claim matching, a custom embedding model trained with knowledge distillation, and demonstrates improved performance over LASER and LaBSE.

Findings

01

Our model exceeds LASER and LaBSE in claim matching accuracy.

02

The dataset includes high-resource and low-resource languages.

03

We release datasets, code, and models for future research.

Abstract

Manual fact-checking does not scale well to serve the needs of the internet. This issue is further compounded in non-English contexts. In this paper, we discuss claim matching as a possible solution to scale fact-checking. We define claim matching as the task of identifying pairs of textual messages containing claims that can be served with one fact-check. We construct a novel dataset of WhatsApp tipline and public group messages alongside fact-checked claims that are first annotated for containing "claim-like statements" and then matched with potentially similar items and annotated for claim matching. Our dataset contains content in high-resource (English, Hindi) and lower-resource (Bengali, Malayalam, Tamil) languages. We train our own embedding model using knowledge distillation and a high-quality "teacher" model in order to address the imbalance in embedding quality between the low-…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsKnowledge Distillation