PlanGuard: Defending Agents against Indirect Prompt Injection via Planning-based Consistency Verification

Guangyu Gong; Zizhuang Deng

arXiv:2604.10134·cs.CR·April 14, 2026

PlanGuard: Defending Agents against Indirect Prompt Injection via Planning-based Consistency Verification

Guangyu Gong, Zizhuang Deng

PDF

TL;DR

PlanGuard is a training-free framework that enhances LLM agent security against indirect prompt injection by using planning and hierarchical verification to ensure behavior aligns with user instructions.

Contribution

It introduces a novel planning-based consistency verification method that effectively defends against IPI attacks without retraining the model.

Findings

01

PlanGuard reduces attack success rate from 72.8% to 0%.

02

It maintains a low false positive rate of 1.49%.

03

The method is model-agnostic and compatible with various systems.

Abstract

Large Language Model (LLM) agents are increasingly integrated into critical systems, leveraging external tools to interact with the real world. However, this capability exposes them to Indirect Prompt Injection (IPI), where attackers embed malicious instructions into retrieved content to manipulate the agent into executing unauthorized or unintended actions. Existing defenses predominantly focus on the pre-processing stage, neglecting the monitoring of the model's actual behavior. In this paper, we propose PlanGuard, a training-free defense framework based on the principle of Context Isolation. Unlike prior methods, PlanGuard introduces an isolated Planner that generates a reference set of valid actions derived solely from user instructions. In addition, we design a Hierarchical Verification Mechanism that first enforces strict hard constraints to block unauthorized tool invocations,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.