Ev2R: Evaluating Evidence Retrieval in Automated Fact-Checking
Mubashara Akhtar, Michael Schlichtkrull, Andreas Vlachos

TL;DR
Ev2R introduces a novel evaluation metric for automated fact-checking that combines reference-based and verdict-level assessments, improving accuracy and robustness over existing methods.
Contribution
The paper proposes Ev2R, a new evidence evaluation metric that jointly assesses evidence relevance and support strength, outperforming prior approaches in correlation with human judgments.
Findings
Ev2R outperforms existing metrics in accuracy.
Ev2R shows higher correlation with human ratings.
Ev2R is more robust to adversarial perturbations.
Abstract
Current automated fact-checking (AFC) approaches typically evaluate evidence either implicitly via the predicted verdicts or through exact matches with predefined closed knowledge sources, such as Wikipedia. However, these methods are limited due to their reliance on evaluation metrics originally designed for other purposes and constraints from closed knowledge sources. In this work, we introduce \textbf{\textcolor{skyblue}{Ev\textsuperscript{2}}\textcolor{orangebrown}{R}} which combines the strengths of reference-based evaluation and verdict-level proxy scoring. Ev\textsuperscript{2}R jointly assesses how well the evidence aligns with the gold references and how reliably it supports the verdict, addressing the shortcomings of prior methods. We evaluate Ev\textsuperscript{2}R against three types of evidence evaluation approaches: reference-based, proxy-reference, and reference-less…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Software Engineering Research · Biomedical Text Mining and Ontologies
