Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System

Haorui He; Yupeng Li; Bin Benjamin Zhu; Dacheng Wen; Reynold Cheng; Francis C. M. Lau

arXiv:2508.06059·cs.CR·November 18, 2025

Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System

Haorui He, Yupeng Li, Bin Benjamin Zhu, Dacheng Wen, Reynold Cheng, Francis C. M. Lau

PDF

Open Access 1 Video

TL;DR

This paper introduces Fact2Fiction, a novel poisoning attack framework targeting advanced agentic fact-checking systems that use LLMs, revealing significant security vulnerabilities and emphasizing the need for defenses.

Contribution

The work presents the first targeted poisoning attack framework for agentic fact-checking systems, demonstrating its effectiveness and exposing critical security weaknesses.

Findings

01

Fact2Fiction achieves 8.9%-21.2% higher attack success rates.

02

The attack exposes vulnerabilities in state-of-the-art fact-checking systems.

03

The study highlights the need for improved defensive measures.

Abstract

State-of-the-art (SOTA) fact-checking systems combat misinformation by employing autonomous LLM-based agents to decompose complex claims into smaller sub-claims, verify each sub-claim individually, and aggregate the partial results to produce verdicts with justifications (explanations for the verdicts). The security of these systems is crucial, as compromised fact-checkers can amplify misinformation, but remains largely underexplored. To bridge this gap, this work introduces a novel threat model against such fact-checking systems and presents \textsc{Fact2Fiction}, the first poisoning attack framework targeting SOTA agentic fact-checking systems. Fact2Fiction employs LLMs to mimic the decomposition strategy and exploit system-generated justifications to craft tailored malicious evidences that compromise sub-claim verification. Extensive experiments demonstrate that Fact2Fiction achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System· underline

Taxonomy

TopicsAdvanced Malware Detection Techniques