Optimizing Agent Planning for Security and Autonomy

Aashish Kolluri; Rishi Sharma; Manuel Costa; Boris K\"opf; Tobias Nie{\ss}en; Mark Russinovich; Shruti Tople; Santiago Zanella-B\'eguelin

arXiv:2602.11416·cs.CR·February 13, 2026

Optimizing Agent Planning for Security and Autonomy

Aashish Kolluri, Rishi Sharma, Manuel Costa, Boris K\"opf, Tobias Nie{\ss}en, Mark Russinovich, Shruti Tople, Santiago Zanella-B\'eguelin

PDF

Open Access 3 Reviews

TL;DR

This paper presents a security-aware agent design that enhances autonomy by reducing human oversight needs while maintaining security, using system-level defenses against prompt injection attacks evaluated on benchmark tasks.

Contribution

It introduces autonomy metrics and a planning approach that balances task progress with policy compliance, improving autonomous execution under security constraints.

Findings

01

Higher autonomy achieved without utility loss

02

Effective defense against prompt injection attacks

03

Improved planning for security and task completion

Abstract

Indirect prompt injection attacks threaten AI agents that execute consequential actions, motivating deterministic system-level defenses. Such defenses can provably block unsafe actions by enforcing confidentiality and integrity policies, but currently appear costly: they reduce task completion rates and increase token usage compared to probabilistic defenses. We argue that existing evaluations miss a key benefit of system-level defenses: reduced reliance on human oversight. We introduce autonomy metrics to quantify this benefit: the fraction of consequential actions an agent can execute without human-in-the-loop (HITL) approval while preserving security. To increase autonomy, we design a security-aware agent that (i) introduces richer HITL interactions, and (ii) explicitly plans for both task progress and policy compliance. We implement this agent design atop an existing…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. The research objective is interesting and bridge the gap between agents in research and agents in real world by taking human confirmation into account. 2. The paper is well organized, especially for the evaluation section, the findings and corresponding empirical evidence are clear. The models are also basically state-of-the-art models, which make the results more convincing.

Weaknesses

1. It is not clear why the paper claims the agent can achieve security "guarantee". If I understand correctly, there may still be policy violation even with the policy aware planning. 2. The paper proposes metrics motivated by real agents but no evaluations on these agents are provided. 3. One key insight is that task completion rate is not a good metric for real world agents with human interactions, but no qualitative or quantitative analysis directly show how bad the task completion rate cou

Reviewer 02Rating 6Confidence 4

Strengths

1. The focused problem and proposed metrics are realistic. In the experience of using LLM-based AI agents, HITL approvals are disruptive and slow down the task completion process, yet they are necessary for security. 2. Policy and security label awareness encourage the agent to use safe and trusted tools and data. With this feature integrated into planning, the AI agent can reduce vulnerability at the root. This both reduces HITL interventions and avoids malicious actions. 3. The dual LLM design

Weaknesses

1. Evaluation. Both AgentDojo and WASP (based on VisualWebArena) contain different subtasks or websites. Previous work like CaMeL [1] also shows results on the subtasks. I suggest the authors include this analysis, since different tasks may require different steps and HITL interventions fundamentally, and averaging them out might not be reasonable. 2. Policy-aware planning needs additional analysis on the total number of steps and the time required to complete the tasks. The agents will try not

Reviewer 03Rating 6Confidence 3

Strengths

1. This paper presents the agent design of PRUDENTIA and deterministic system-level defense, which is well-motivated and aims to address a critical issue in agent security research. 2. To quantify the autonomy benefits and evaluate the proposed approach, this paper introduces new autonomy metrics and conducts extensive experiments. 3. This paper has good real-world applications, e.g., reducing security risks, improving user trusts, and safer integration with external data for AI agents.

Weaknesses

1. This paper only focuses on indirect prompt injection attacks, the generalization of the proposed approach on other types of attacks is not discussed. 2. While the proposed approach outperforms baselines in autonomy, its completion rate shows some regression (73.2% vs. s FIDES’ 75.7%). This might suggest some minor trade-offs between autonomy and performance, but this is not discussed.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecurity and Verification in Computing · Adversarial Robustness in Machine Learning · Information and Cyber Security