LivePI: More Realistic Benchmarking of Agents Against Indirect Prompt Injectio
Lei Zhao, Abhay Bhaskar, Edgar Dobriban

TL;DR
LivePI is a comprehensive benchmark testing AI agents against indirect prompt injection risks across multiple input channels in a realistic environment, revealing attack success rates and evaluating defenses.
Contribution
The paper introduces LivePI, a structured, multi-channel benchmark for assessing IPI risks in AI agents within a production-like setting, including evaluation of defenses.
Findings
Attack success rates range from 10.7% to 29.6%.
Group-chat injection attacks are highly successful.
Prompt filtering and tool-call authorization can intercept malicious actions.
Abstract
AI agents such as OpenClaw are increasingly deployed in local workflows with access to external tools. This creates indirect prompt-injection (IPI) risk: an agent may execute harmful instructions embedded in untrusted inputs such as email, downloaded files, webpages, repositories, or group-chat messages. Existing evaluations are often small, purely simulated, or focused on a narrow set of channels. We introduce LivePI (Live Prompt Injection), a structured benchmark for IPI risk in a production-like but test-controlled environment. LivePI covers seven input surfaces, twelve attack/rendering families, and five malicious goals, including protected-information exfiltration, unauthorized security-control changes, unsafe code retrieval or execution, inbox-summary exfiltration, and cryptocurrency transfer. We run LivePI on a real virtual machine with live but test-controlled email, chat, web,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
