(Fact) Check Your Bias
Eivind Morris Bakke, Nora Winger Heggelund

TL;DR
This paper investigates how biases in large language models influence automatic fact verification, revealing that prompt strategies affect evidence retrieval but not final verdicts, highlighting inherent biases and stability in outcomes.
Contribution
It demonstrates the impact of parametric and prompted biases in LLMs on fact-checking outcomes, providing insights into evidence retrieval and verdict stability.
Findings
Nearly half the claims are labeled 'Not Enough Evidence' by Llama 3.1.
Prompting strategies significantly alter evidence retrieval results.
Final verdicts remain stable despite evidence differences.
Abstract
Automatic fact verification systems increasingly rely on large language models (LLMs). We investigate how parametric knowledge biases in these models affect fact-checking outcomes of the HerO system (baseline for FEVER-25). We examine how the system is affected by: (1) potential bias in Llama 3.1's parametric knowledge and (2) intentionally injected bias. When prompted directly to perform fact-verification, Llama 3.1 labels nearly half the claims as "Not Enough Evidence". Using only its parametric knowledge it is able to reach a verdict on the remaining half of the claims. In the second experiment, we prompt the model to generate supporting, refuting, or neutral fact-checking documents. These prompts significantly influence retrieval outcomes, with approximately 50\% of retrieved evidence being unique to each perspective. Notably, the model sometimes refuses to generate supporting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Misinformation and Its Impacts
