CAMOUFLAGE: Exploiting Misinformation Detection Systems Through   LLM-driven Adversarial Claim Transformation

Mazal Bethany; Nishant Vishwamitra; Cho-Yu Jason Chiang; Peyman; Najafirad

arXiv:2505.01900·cs.CL·May 6, 2025

CAMOUFLAGE: Exploiting Misinformation Detection Systems Through LLM-driven Adversarial Claim Transformation

Mazal Bethany, Nishant Vishwamitra, Cho-Yu Jason Chiang, Peyman, Najafirad

PDF

Open Access

TL;DR

CAMOUFLAGE is an LLM-driven adversarial attack method that rewrites claims to bypass evidence-based misinformation detection systems by manipulating retrieval and comparison modules, achieving nearly 47% success.

Contribution

It introduces a novel two-agent iterative approach for generating semantically equivalent adversarial claims that can fool complex misinformation detectors without relying on gradient information.

Findings

01

Achieves an average attack success rate of 46.92%.

02

Effectively manipulates evidence retrieval and comparison modules.

03

Preserves semantic integrity of claims.

Abstract

Automated evidence-based misinformation detection systems, which evaluate the veracity of short claims against evidence, lack comprehensive analysis of their adversarial vulnerabilities. Existing black-box text-based adversarial attacks are ill-suited for evidence-based misinformation detection systems, as these attacks primarily focus on token-level substitutions involving gradient or logit-based optimization strategies, which are incapable of fooling the multi-component nature of these detection systems. These systems incorporate both retrieval and claim-evidence comparison modules, which requires attacks to break the retrieval of evidence and/or the comparison module so that it draws incorrect inferences. We present CAMOUFLAGE, an iterative, LLM-driven approach that employs a two-agent system, a Prompt Optimization Agent and an Attacker Agent, to create adversarial claim rewritings…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Network Security and Intrusion Detection · Advanced Malware Detection Techniques

MethodsFocus