Securing AI Agents with Information-Flow Control
Manuel Costa, Boris K\"opf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, Santiago Zanella-B\'eguelin

TL;DR
This paper introduces a formal framework and a new planner, Fides, that uses information-flow control to enhance the security of AI agents, particularly against prompt injection, while maintaining task utility.
Contribution
It develops a formal model for security in AI agents, characterizes enforceable properties, and presents Fides, a planner with novel security primitives and demonstrated effectiveness.
Findings
Fides enforces security policies through dynamic taint-tracking.
The approach enables secure completion of diverse tasks.
Evaluation shows effective security guarantees in AgentDojo.
Abstract
As AI agents become increasingly autonomous and capable, ensuring their security against vulnerabilities such as prompt injection becomes critical. This paper explores the use of information-flow control (IFC) to provide security guarantees for AI agents. We present a formal model to reason about the security and expressiveness of agent planners. Using this model, we characterize the class of properties enforceable by dynamic taint-tracking and construct a taxonomy of tasks to evaluate security and utility trade-offs of planner designs. Informed by this exploration, we present Fides, a planner that tracks confidentiality and integrity labels, deterministically enforces security policies, and introduces novel primitives for selectively hiding information. Its evaluation in AgentDojo demonstrates that this approach enables us to complete a broad range of tasks with security guarantees. A…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Access Control and Trust · Adversarial Robustness in Machine Learning
