Defending Against Indirect Prompt Injection Attacks With Spotlighting
Keegan Hines, Gary Lopez, Matthew Hall, Federico Zarfati, Yonatan, Zunger, Emre Kiciman

TL;DR
This paper introduces spotlighting, a prompt engineering technique that enhances large language models' ability to identify input sources, significantly reducing the success of indirect prompt injection attacks with minimal impact on NLP tasks.
Contribution
The paper presents a novel spotlighting approach that improves source attribution in LLMs, effectively defending against prompt injection attacks.
Findings
Spotlighting reduces attack success rate from over 50% to below 2%.
Minimal impact on the performance of underlying NLP tasks.
Robust defense demonstrated across GPT-family models.
Abstract
Large Language Models (LLMs), while powerful, are built and trained to process a single text input. In common applications, multiple inputs can be processed by concatenating them together into a single stream of text. However, the LLM is unable to distinguish which sections of prompt belong to various input sources. Indirect prompt injection attacks take advantage of this vulnerability by embedding adversarial instructions into untrusted data being processed alongside user commands. Often, the LLM will mistake the adversarial instructions as user commands to be followed, creating a security vulnerability in the larger system. We introduce spotlighting, a family of prompt engineering techniques that can be used to improve LLMs' ability to distinguish among multiple sources of input. The key insight is to utilize transformations of an input to provide a reliable and continuous signal of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Data Security · Security and Verification in Computing · Formal Methods in Verification
