TL;DR
ClawGuard introduces a runtime security framework that enforces user-confirmed rules at tool-call boundaries, effectively preventing indirect prompt injection in tool-augmented LLM agents without modifying models.
Contribution
It presents a deterministic, auditable defense mechanism that derives task-specific constraints to block injection pathways, enhancing security without infrastructure changes.
Findings
Achieves robust protection against indirect prompt injection across multiple models and benchmarks.
Maintains agent utility and incurs minimal token overhead.
Demonstrates effectiveness without model modification or infrastructure change.
Abstract
Tool-augmented Large Language Model (LLM) agents have demonstrated impressive capabilities in automating complex, multi-step real-world tasks, yet remain vulnerable to indirect prompt injection. Adversaries exploit this weakness by embedding malicious instructions within tool-returned content, which agents directly incorporate into their conversation history as trusted observations. To address these vulnerabilities, we introduce \textsc{ClawGuard}, a novel runtime security framework that enforces a user-confirmed rule set at every tool-call boundary, transforming unreliable alignment-dependent defense into a deterministic, auditable mechanism that intercepts adversarial tool calls before any real-world effect is produced. By automatically deriving task-specific access constraints from the user's stated objective prior to any external tool invocation, \textsc{ClawGuard} blocks all three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
