Securing AI Agents with Information-Flow Control

Manuel Costa; Boris K\"opf; Aashish Kolluri; Andrew Paverd; Mark Russinovich; Ahmed Salem; Shruti Tople; Lukas Wutschitz; Santiago Zanella-B\'eguelin

arXiv:2505.23643·cs.CR·September 4, 2025·2 cites

Securing AI Agents with Information-Flow Control

Manuel Costa, Boris K\"opf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, Santiago Zanella-B\'eguelin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a formal framework and a new planner, Fides, that uses information-flow control to enhance the security of AI agents, particularly against prompt injection, while maintaining task utility.

Contribution

It develops a formal model for security in AI agents, characterizes enforceable properties, and presents Fides, a planner with novel security primitives and demonstrated effectiveness.

Findings

01

Fides enforces security policies through dynamic taint-tracking.

02

The approach enables secure completion of diverse tasks.

03

Evaluation shows effective security guarantees in AgentDojo.

Abstract

As AI agents become increasingly autonomous and capable, ensuring their security against vulnerabilities such as prompt injection becomes critical. This paper explores the use of information-flow control (IFC) to provide security guarantees for AI agents. We present a formal model to reason about the security and expressiveness of agent planners. Using this model, we characterize the class of properties enforceable by dynamic taint-tracking and construct a taxonomy of tasks to evaluate security and utility trade-offs of planner designs. Informed by this exploration, we present Fides, a planner that tracks confidentiality and integrity labels, deterministically enforces security policies, and introduces novel primitives for selectively hiding information. Its evaluation in AgentDojo demonstrates that this approach enables us to complete a broad range of tasks with security guarantees. A…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/fides
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing · Access Control and Trust · Adversarial Robustness in Machine Learning