Who Grants the Agent Power? Defending Against Instruction Injection via Task-Centric Access Control

Yifeng Cai; Ziming Wang; Zhaomeng Deng; Mengyu Yao; Junlin Liu; Yutao Hu; Ziqi Zhang; Yao Guo; Ding Li

arXiv:2510.26212·cs.CR·October 31, 2025

Who Grants the Agent Power? Defending Against Instruction Injection via Task-Centric Access Control

Yifeng Cai, Ziming Wang, Zhaomeng Deng, Mengyu Yao, Junlin Liu, Yutao Hu, Ziqi Zhang, Yao Guo, Ding Li

PDF

TL;DR

This paper introduces AgentSentry, a dynamic access control framework that enforces task-specific permissions to prevent instruction injection attacks in autonomous AI agents, enhancing security without hindering legitimate tasks.

Contribution

AgentSentry is the first lightweight runtime system that dynamically enforces task-centric permissions to defend against instruction injection in AI agents.

Findings

01

AgentSentry prevents instruction injection attacks effectively.

02

Dynamic, task-specific permissions improve security without impeding legitimate tasks.

03

The framework is lightweight and adaptable for real-world deployment.

Abstract

AI agents capable of GUI understanding and Model Context Protocol are increasingly deployed to automate mobile tasks. However, their reliance on over-privileged, static permissions creates a critical vulnerability: instruction injection. Malicious instructions, embedded in otherwise benign content like emails, can hijack the agent to perform unauthorized actions. We present AgentSentry, a lightweight runtime task-centric access control framework that enforces dynamic, task-scoped permissions. Instead of granting broad, persistent permissions, AgentSentry dynamically generates and enforces minimal, temporary policies aligned with the user's specific task (e.g., register for an app), revoking them upon completion. We demonstrate that AgentSentry successfully prevents an instruction injection attack, where an agent is tricked into forwarding private emails, while allowing the legitimate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.