InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated   Large Language Model Agents

Qiusi Zhan; Zhixiang Liang; Zifan Ying; Daniel Kang

arXiv:2403.02691·cs.CL·August 6, 2024·5 cites

InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents

Qiusi Zhan, Zhixiang Liang, Zifan Ying, Daniel Kang

PDF

Open Access 2 Repos 1 Video

TL;DR

InjecAgent is a comprehensive benchmark that evaluates the vulnerability of tool-integrated large language model agents to indirect prompt injection attacks, revealing significant security risks and the need for mitigation strategies.

Contribution

This work introduces InjecAgent, the first benchmark specifically designed to assess IPI attack vulnerabilities in LLM agents across multiple tools and attacker scenarios.

Findings

01

Agents are vulnerable to IPI attacks, with GPT-4 being compromised 24% of the time.

02

Reinforcing attacker instructions with hacking prompts increases attack success rates.

03

The benchmark covers 1,054 test cases across 17 user tools and 62 attacker tools.

Abstract

Recent work has embodied LLMs as agents, allowing them to access tools, perform actions, and interact with external content (e.g., emails or websites). However, external content introduces the risk of indirect prompt injection (IPI) attacks, where malicious instructions are embedded within the content processed by LLMs, aiming to manipulate these agents into executing detrimental actions against users. Given the potentially severe consequences of such attacks, establishing benchmarks to assess and mitigate these risks is imperative. In this work, we introduce InjecAgent, a benchmark designed to assess the vulnerability of tool-integrated LLM agents to IPI attacks. InjecAgent comprises 1,054 test cases covering 17 different user tools and 62 attacker tools. We categorize attack intentions into two primary types: direct harm to users and exfiltration of private data. We evaluate 30…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Business Process Modeling and Analysis

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Multi-Head Attention · Layer Normalization · Dropout · Softmax · Dense Connections · Label Smoothing · Adam