TL;DR
This paper reveals a vulnerability in LLM-based web navigation agents where adversaries can embed malicious triggers in HTML to hijack agent behavior via the accessibility tree, demonstrating significant security risks.
Contribution
The study introduces a novel indirect prompt injection attack method exploiting HTML accessibility features against LLM web agents, with empirical validation on real websites.
Findings
High success rates of attack in real-world scenarios
Demonstrated risks of hijacking LLM web agents
Need for improved security defenses
Abstract
This work demonstrates that LLM-based web navigation agents offer powerful automation capabilities but are vulnerable to Indirect Prompt Injection (IPI) attacks. We show that adversaries can embed universal adversarial triggers in webpage HTML to hijack agent behavior that utilizes the accessibility tree to parse HTML, causing unintended or malicious actions. Using the Greedy Coordinate Gradient (GCG) algorithm and a Browser Gym agent powered by Llama-3.1, our system demonstrates high success rates across real websites in both targeted and general attacks, including login credential exfiltration and forced ad clicks. Our empirical results highlight critical security risks and the need for stronger defenses as LLM-driven autonomous web agents become more widely adopted. The system software (https://github.com/sej2020/manipulating-web-agents) is released under the MIT License, with an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
