WebInject: Prompt Injection Attack to Web Agents
Xilong Wang, John Bloch, Zedian Shao, Yuepeng Hu, Shuyan Zhou, Neil Zhenqiang Gong

TL;DR
WebInject demonstrates a novel prompt injection attack on web agents by perturbing webpage screenshots, effectively manipulating their actions through an optimized pixel perturbation approach.
Contribution
This work introduces WebInject, a new method for prompt injection attacks on web agents using pixel perturbations and an approximation of the pixel mapping.
Findings
WebInject significantly outperforms baseline attacks.
The attack effectively manipulates web agent actions.
The method is applicable across multiple datasets.
Abstract
Multi-modal large language model (MLLM)-based web agents interact with webpage environments by generating actions based on screenshots of the webpages. In this work, we propose WebInject, a prompt injection attack that manipulates the webpage environment to induce a web agent to perform an attacker-specified action. Our attack adds a perturbation to the raw pixel values of the rendered webpage. After these perturbed pixels are mapped into a screenshot, the perturbation induces the web agent to perform the attacker-specified action. We formulate the task of finding the perturbation as an optimization problem. A key challenge in solving this problem is that the mapping between raw pixel values and screenshot is non-differentiable, making it difficult to backpropagate gradients to the perturbation. To overcome this, we train a neural network to approximate the mapping and apply projected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsWeb Application Security Vulnerabilities · Spam and Phishing Detection · Adversarial Robustness in Machine Learning
