Prompt Leakage effect and defense strategies for multi-turn LLM   interactions

Divyansh Agarwal; Alexander R. Fabbri; Ben Risher; Philippe Laban,; Shafiq Joty; Chien-Sheng Wu

arXiv:2404.16251·cs.CR·July 30, 2024·1 cites

Prompt Leakage effect and defense strategies for multi-turn LLM interactions

Divyansh Agarwal, Alexander R. Fabbri, Ben Risher, Philippe Laban,, Shafiq Joty, Chien-Sheng Wu

PDF

Open Access

TL;DR

This paper systematically investigates prompt leakage vulnerabilities in multi-turn LLM interactions, evaluates attack success rates, and proposes defense strategies to enhance security and privacy in LLM applications.

Contribution

It introduces a novel threat model leveraging LLM sycophancy, evaluates leakage across multiple models and domains, and assesses various defense strategies including finetuning and black-box methods.

Findings

01

Attack success rate increased from 17.7% to 86.2% under the threat model.

02

Different defenses can mitigate prompt leakage with varying costs.

03

Analysis provides insights for building secure multi-turn LLM systems.

Abstract

Prompt leakage poses a compelling security and privacy threat in LLM applications. Leakage of system prompts may compromise intellectual property, and act as adversarial reconnaissance for an attacker. A systematic evaluation of prompt leakage threats and mitigation strategies is lacking, especially for multi-turn LLM interactions. In this paper, we systematically investigate LLM vulnerabilities against prompt leakage for 10 closed- and open-source LLMs, across four domains. We design a unique threat model which leverages the LLM sycophancy effect and elevates the average attack success rate (ASR) from 17.7% to 86.2% in a multi-turn setting. Our standardized setup further allows dissecting leakage of specific prompt contents such as task instructions and knowledge documents. We measure the mitigation effect of 7 black-box defense strategies, along with finetuning an open-source model to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParticle accelerators and beam dynamics · Gyrotron and Vacuum Electronics Research · Advancements in Semiconductor Devices and Circuit Design

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Weight Decay · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Linear Layer · Label Smoothing · Adam · Linear Warmup With Linear Decay