The Task Shield: Enforcing Task Alignment to Defend Against Indirect   Prompt Injection in LLM Agents

Feiran Jia; Tong Wu; Xin Qin; Anna Squicciarini

arXiv:2412.16682·cs.CR·December 24, 2024

The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents

Feiran Jia, Tong Wu, Xin Qin, Anna Squicciarini

PDF

Open Access 1 Video

TL;DR

The paper introduces Task Shield, a novel defense mechanism for LLM agents that ensures each action aligns with user goals, effectively reducing prompt injection attacks while maintaining task performance.

Contribution

It presents a new security approach focusing on task alignment, with a test-time system that verifies actions contribute to user objectives, enhancing robustness against attacks.

Findings

01

Significantly reduces attack success rate to 2.07%.

02

Maintains high task utility at 69.79%.

03

Demonstrates effectiveness on the AgentDojo benchmark.

Abstract

Large Language Model (LLM) agents are increasingly being deployed as conversational assistants capable of performing complex real-world tasks through tool integration. This enhanced ability to interact with external systems and process various data sources, while powerful, introduces significant security vulnerabilities. In particular, indirect prompt injection attacks pose a critical threat, where malicious instructions embedded within external data sources can manipulate agents to deviate from user intentions. While existing defenses based on rule constraints, source spotlighting, and authentication protocols show promise, they struggle to maintain robust security while preserving task functionality. We propose a novel and orthogonal perspective that reframes agent security from preventing harmful actions to ensuring task alignment, requiring every agent action to serve user…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

The Task Shield: Enforcing Task Alignment to Defend Against Indirect Prompt Injection in LLM Agents· underline

Taxonomy

TopicsScheduling and Optimization Algorithms · Business Process Modeling and Analysis · Formal Methods in Verification