Face the Facts! Evaluating RAG-based Pipelines for Professional Fact-Checking
Daniel Russo, Stefano Menini, Jacopo Staiano, Marco Guerini

TL;DR
This paper benchmarks RAG-based pipelines for professional fact-checking, analyzing their performance on complex claims and diverse knowledge bases, revealing strengths and limitations of different retrieval and generation models.
Contribution
It lifts constraints of current RAG-based fact-checking pipelines and provides a comprehensive evaluation following professional practices.
Findings
LLM-based retrievers outperform other retrieval methods
Larger models improve verdict faithfulness
Smaller models better adhere to context
Abstract
Natural Language Processing and Generation systems have recently shown the potential to complement and streamline the costly and time-consuming job of professional fact-checkers. In this work, we lift several constraints of current state-of-the-art pipelines for automated fact-checking based on the Retrieval-Augmented Generation (RAG) paradigm. Our goal is to benchmark, following professional fact-checking practices, RAG-based methods for the generation of verdicts - i.e., short texts discussing the veracity of a claim - evaluating them on stylistically complex claims and heterogeneous, yet reliable, knowledge bases. Our findings show a complex landscape, where, for example, LLM-based retrievers outperform other retrieval techniques, though they still struggle with heterogeneous knowledge bases; larger models excel in verdict faithfulness, while smaller models provide better context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
