AgentVisor: Defending LLM Agents Against Prompt Injection via Semantic Virtualization
Zonghao Ying, Haozheng Wang, Jiangfan Liu, Quanchen Zou, Aishan Liu, Jian Yang, Yaodong Yang, Xianglong Liu

TL;DR
AgentVisor is a novel framework inspired by OS virtualization that enforces semantic privilege separation to defend LLM agents against prompt injection attacks, balancing security and utility effectively.
Contribution
It introduces a semantic privilege separation approach with an audit protocol and self-correction mechanism, significantly reducing attack success while maintaining high utility.
Findings
Attack success rate reduced to 0.65%
Utility decreased by only 1.45% compared to no defense
Outperforms existing defense methods in security and utility balance
Abstract
Large Language Model (LLM) agents are increasingly used to automate complex workflows, but integrating untrusted external data with privileged execution exposes them to severe security risks, particularly direct and indirect prompt injection. Existing defenses face significant challenges in balancing security with utility, often encountering a trade-off where rigorous protection leads to over-defense, or where subtle indirect injections bypass detection. Drawing inspiration from operating system virtualization, we propose AgentVisor, a novel defense framework that enforces semantic privilege separation. AgentVisor treats the target agent as an untrusted guest and intercepts tool calls via a trusted semantic visor. Central to our approach is a rigorous audit protocol grounded in classic OS security primitives, designed to systematically mitigate both direct and indirect injection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
