Robust Information Retrieval for False Claims with Distracting Entities In Fact Extraction and Verification
Mingwen Dong, Christos Christodoulopoulos, Sheng-Min Shih, Xiaofei Ma

TL;DR
This paper investigates the challenges false claims pose to evidence retrieval in fact checking, revealing that irrelevant entities in false claims hinder retrieval accuracy, and proposes data augmentation and model ensemble techniques to improve robustness.
Contribution
It identifies the impact of irrelevant entities in false claims on retrieval performance and introduces data augmentation and model ensemble methods to enhance robustness.
Findings
Retrieval models perform worse on false claims with irrelevant entities.
Data augmentation with synthetic false claims improves recall.
Model ensemble strategies increase evidence retrieval accuracy.
Abstract
Accurate evidence retrieval is essential for automated fact checking. Little previous research has focused on the differences between true and false claims and how they affect evidence retrieval. This paper shows that, compared with true claims, false claims more frequently contain irrelevant entities which can distract evidence retrieval model. A BERT-based retrieval model made more mistakes in retrieving refuting evidence for false claims than supporting evidence for true claims. When tested with adversarial false claims (synthetically generated) containing irrelevant entities, the recall of the retrieval model is significantly lower than that for original claims. These results suggest that the vanilla BERT-based retrieval model is not robust to irrelevant entities in the false claims. By augmenting the training data with synthetic false claims containing irrelevant entities, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Explainable Artificial Intelligence (XAI)
