Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking

Mubashara Akhtar; Michael Schlichtkrull; Andreas Vlachos

arXiv:2411.05375·cs.CL·July 22, 2025

Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking

Mubashara Akhtar, Michael Schlichtkrull, Andreas Vlachos

PDF

Open Access 1 Video

TL;DR

Ev2R introduces a novel evaluation metric for automated fact-checking that combines reference-based and verdict-level assessments, improving accuracy and robustness over existing methods.

Contribution

The paper proposes Ev2R, a new evidence evaluation metric that jointly assesses evidence relevance and support strength, outperforming prior approaches in correlation with human judgments.

Findings

01

Ev2R outperforms existing metrics in accuracy.

02

Ev2R shows higher correlation with human ratings.

03

Ev2R is more robust to adversarial perturbations.

Abstract

Current automated fact-checking (AFC) approaches typically evaluate evidence either implicitly via the predicted verdicts or through exact matches with predefined closed knowledge sources, such as Wikipedia. However, these methods are limited due to their reliance on evaluation metrics originally designed for other purposes and constraints from closed knowledge sources. In this work, we introduce \textbf{\textcolor{skyblue}{Ev\textsuperscript{2}}\textcolor{orangebrown}{R}} which combines the strengths of reference-based evaluation and verdict-level proxy scoring. Ev\textsuperscript{2}R jointly assesses how well the evidence aligns with the gold references and how reliably it supports the verdict, addressing the shortcomings of prior methods. We evaluate Ev\textsuperscript{2}R against three types of evidence evaluation approaches: reference-based, proxy-reference, and reference-less…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking· underline

Taxonomy

TopicsTopic Modeling · Software Engineering Research · Biomedical Text Mining and Ontologies