SnapGuard: Lightweight Prompt Injection Detection for Screenshot-Based Web Agents
Mengyao Du, Han Fang, Haokai Ma, Jiahao Chen, Kai Xu, Quanjun Yin, and Ee-Chien Chang

TL;DR
SnapGuard is a lightweight, multimodal detection method that identifies prompt injection attacks on screenshot-based web agents by analyzing visual stability and textual signals, achieving high accuracy with low computational cost.
Contribution
The paper introduces SnapGuard, a novel lightweight detection approach that effectively identifies prompt injection attacks on visual web agents without relying on large vision-language models.
Findings
SnapGuard achieves an F1 score of 0.75 in detecting attacks.
It is 8 times faster than GPT-4-based methods.
It introduces no additional memory overhead.
Abstract
Web agents have emerged as an effective paradigm for automating interactions with complex web environments, yet remain vulnerable to prompt injection attacks that embed malicious instructions into webpage content to induce unintended actions. This threat is further amplified for screenshot-based web agents, which operate on rendered visual webpages rather than structured textual representations, making predominant text-centric defenses ineffective. Although multimodal detection methods have been explored, they often rely on large vision-language models (VLMs), incurring significant computational overhead. The bottleneck lies in the complexity of modern webpages: VLMs must comprehend the global semantics of an entire page, resulting in substantial inference time and GPU memory usage. This raises a critical question: can we detect prompt injection attacks from screenshots in a lightweight…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
