Defending Against Indirect Prompt Injection Attacks With Spotlighting

Keegan Hines; Gary Lopez; Matthew Hall; Federico Zarfati; Yonatan; Zunger; Emre Kiciman

arXiv:2403.14720·cs.CR·March 25, 2024·5 cites

Defending Against Indirect Prompt Injection Attacks With Spotlighting

Keegan Hines, Gary Lopez, Matthew Hall, Federico Zarfati, Yonatan, Zunger, Emre Kiciman

PDF

Open Access 1 Repo

TL;DR

This paper introduces spotlighting, a prompt engineering technique that enhances large language models' ability to identify input sources, significantly reducing the success of indirect prompt injection attacks with minimal impact on NLP tasks.

Contribution

The paper presents a novel spotlighting approach that improves source attribution in LLMs, effectively defending against prompt injection attacks.

Findings

01

Spotlighting reduces attack success rate from over 50% to below 2%.

02

Minimal impact on the performance of underlying NLP tasks.

03

Robust defense demonstrated across GPT-family models.

Abstract

Large Language Models (LLMs), while powerful, are built and trained to process a single text input. In common applications, multiple inputs can be processed by concatenating them together into a single stream of text. However, the LLM is unable to distinguish which sections of prompt belong to various input sources. Indirect prompt injection attacks take advantage of this vulnerability by embedding adversarial instructions into untrusted data being processed alongside user commands. Often, the LLM will mistake the adversarial instructions as user commands to be followed, creating a security vulnerability in the larger system. We introduce spotlighting, a family of prompt engineering techniques that can be used to improve LLMs' ability to distinguish among multiple sources of input. The key insight is to utilize transformations of an input to provide a reliable and continuous signal of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/llmail-inject-challenge
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCryptography and Data Security · Security and Verification in Computing · Formal Methods in Verification