CausalArmor: Efficient Indirect Prompt Injection Guardrails via Causal Attribution

Minbeom Kim; Mihir Parmar; Phillip Wallis; Lesly Miculicich; Kyomin Jung; Krishnamurthy Dj Dvijotham; Long T. Le; Tomas Pfister

arXiv:2602.07918·cs.CR·February 10, 2026

CausalArmor: Efficient Indirect Prompt Injection Guardrails via Causal Attribution

Minbeom Kim, Mihir Parmar, Phillip Wallis, Lesly Miculicich, Kyomin Jung, Krishnamurthy Dj Dvijotham, Long T. Le, Tomas Pfister

PDF

Open Access

TL;DR

CausalArmor is a selective defense framework for AI agents that uses causal attribution to identify and mitigate indirect prompt injection attacks, improving security without sacrificing utility or latency.

Contribution

It introduces a causal attribution-based method for targeted sanitization, reducing over-defense and enhancing explainability in defending against IPI attacks.

Findings

01

Matches security of aggressive defenses

02

Improves explainability of defense mechanism

03

Preserves utility and latency in AI agents

Abstract

AI agents equipped with tool-calling capabilities are susceptible to Indirect Prompt Injection (IPI) attacks. In this attack scenario, malicious commands hidden within untrusted content trick the agent into performing unauthorized actions. Existing defenses can reduce attack success but often suffer from the over-defense dilemma: they deploy expensive, always-on sanitization regardless of actual threat, thereby degrading utility and latency even in benign scenarios. We revisit IPI through a causal ablation perspective: a successful injection manifests as a dominance shift where the user request no longer provides decisive support for the agent's privileged action, while a particular untrusted segment, such as a retrieved document or tool output, provides disproportionate attributable influence. Based on this signature, we propose CausalArmor, a selective defense framework that (i)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Security and Verification in Computing · Explainable Artificial Intelligence (XAI)