Hallucination as Exploit: Evidence-Carrying Multimodal Agents
Guijia Zhang, Hao Zheng, Harry Yang

TL;DR
This paper introduces evidence-carrying multimodal agents (ECA) that improve safety by certifying and verifying tool calls, preventing hallucination-induced unsafe actions in multimodal AI systems.
Contribution
ECA formalizes hallucination-to-action conversion, decomposes tool calls into critical predicates, and uses certified verification to enhance safety and auditability.
Findings
ECA achieves zero unsafe executions on 200 end-to-end tasks.
Four targeted hardening steps eliminate gate bypass in verification.
Content-derived certificates prevent unsafe actions in tested scenarios.
Abstract
Multimodal agents increasingly choose tool calls from screenshots, documents, and webpages, where a false perceptual claim can turn hallucination from an answer-quality error into an authorization failure. We formalize this failure mode as hallucination-to-action conversion: an unsupported claim supplies the precondition for a privileged action. We propose evidence-carrying multimodal agents (ECA), which treat free-form model text as inadmissible evidence, decompose each tool call into action-critical predicates, obtain typed certificates from constrained DOM/OCR/AX verifiers, and use a deterministic gate to authorize only the privileges those certificates support. Rather than hiding perception error, ECA converts opaque model belief into auditable residuals at the verifier, schema, and implementation levels. Verifier red-teaming across 17 canonical attack categories shows that four…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
