The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents
Feiran Jia, Tong Wu, Xin Qin, Anna Squicciarini

TL;DR
The paper introduces Task Shield, a novel defense mechanism for LLM agents that ensures each action aligns with user goals, effectively reducing prompt injection attacks while maintaining task performance.
Contribution
It presents a new security approach focusing on task alignment, with a test-time system that verifies actions contribute to user objectives, enhancing robustness against attacks.
Findings
Significantly reduces attack success rate to 2.07%.
Maintains high task utility at 69.79%.
Demonstrates effectiveness on the AgentDojo benchmark.
Abstract
Large Language Model (LLM) agents are increasingly being deployed as conversational assistants capable of performing complex real-world tasks through tool integration. This enhanced ability to interact with external systems and process various data sources, while powerful, introduces significant security vulnerabilities. In particular, indirect prompt injection attacks pose a critical threat, where malicious instructions embedded within external data sources can manipulate agents to deviate from user intentions. While existing defenses based on rule constraints, source spotlighting, and authentication protocols show promise, they struggle to maintain robust security while preserving task functionality. We propose a novel and orthogonal perspective that reframes agent security from preventing harmful actions to ensuring task alignment, requiring every agent action to serve user…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsScheduling and Optimization Algorithms · Business Process Modeling and Analysis · Formal Methods in Verification
