Not what you've signed up for: Compromising Real-World LLM-Integrated   Applications with Indirect Prompt Injection

Kai Greshake; Sahar Abdelnabi; Shailesh Mishra; Christoph Endres,; Thorsten Holz; Mario Fritz

arXiv:2302.12173·cs.CR·May 8, 2023·40 cites

Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

Kai Greshake, Sahar Abdelnabi, Shailesh Mishra, Christoph Endres,, Thorsten Holz, Mario Fritz

PDF

Open Access 2 Repos 1 Models 5 Datasets

TL;DR

This paper uncovers new security vulnerabilities in LLM-integrated applications through indirect prompt injection, demonstrating practical attacks and emphasizing the need for robust defenses to ensure safe deployment.

Contribution

It introduces the concept of indirect prompt injection attacks, providing a comprehensive taxonomy and demonstrating real-world vulnerabilities in popular LLM applications.

Findings

01

Practical attacks on Bing's GPT-4 and synthetic systems.

02

Demonstration of arbitrary code execution via prompt processing.

03

Identification of new security risks like data theft and ecosystem contamination.

Abstract

Large Language Models (LLMs) are increasingly being integrated into various applications. The functionalities of recent LLMs can be flexibly modulated via natural language prompts. This renders them susceptible to targeted adversarial prompting, e.g., Prompt Injection (PI) attacks enable attackers to override original instructions and employed controls. So far, it was assumed that the user is directly prompting the LLM. But, what if it is not the user prompting? We argue that LLM-Integrated Applications blur the line between data and instructions. We reveal new attack vectors, using Indirect Prompt Injection, that enable adversaries to remotely (without a direct interface) exploit LLM-integrated applications by strategically injecting prompts into data likely to be retrieved. We derive a comprehensive taxonomy from a computer security perspective to systematically investigate impacts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
DavidTKeane/cyberranger-v42
model· 51 dl· ♡ 1
51 dl♡ 1

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Web Application Security Vulnerabilities · Access Control and Trust