The Facade of Truth: Uncovering and Mitigating LLM Susceptibility to Deceptive Evidence
Herun Wan, Jiaying Wu, Minnan Luo, Fanxiao Li, Zhi Zeng, and Min-Yen Kan

TL;DR
This paper reveals a fundamental vulnerability of large language models to sophisticated deceptive evidence, introduces a framework to generate such evidence, and proposes a governance mechanism to mitigate this susceptibility.
Contribution
It introduces MisBelief, a framework for generating deceptive evidence against LLMs, and proposes Deceptive Intent Shielding (DIS) to mitigate model susceptibility.
Findings
Models are highly sensitive to refined deceptive evidence.
Belief scores in falsehoods increase by 93.0% on average.
DIS effectively mitigates belief shifts caused by deceptive evidence.
Abstract
To reliably assist human decision-making, LLMs must maintain factual internal beliefs against misleading injections. While current models resist explicit misinformation, we uncover a fundamental vulnerability to sophisticated, hard-to-falsify evidence. To systematically probe this weakness, we introduce MisBelief, a framework that generates misleading evidence via collaborative, multi-round interactions among multi-role LLMs. This process mimics subtle, defeasible reasoning and progressive refinement to create logically persuasive yet factually deceptive claims. Using MisBelief, we generate 4,800 instances across three difficulty levels to evaluate 7 representative LLMs. Results indicate that while models are robust to direct misinformation, they are highly sensitive to this refined evidence: belief scores in falsehoods increase by an average of 93.0\%, fundamentally compromising…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Deception detection and forensic psychology · Ethics and Social Impacts of AI
