Has My System Prompt Been Used? Large Language Model Prompt Membership Inference
Roman Levin, Valeriia Cherepanova, Abhimanyu Hans, Avi Schwarzschild,, Tom Goldstein

TL;DR
This paper introduces Prompt Detective, a statistical method to determine if a specific system prompt was used with a large language model, highlighting privacy concerns in prompt engineering.
Contribution
It presents a novel statistical approach for prompt membership inference, enabling detection of prompt usage through output distribution analysis.
Findings
Prompt Detective reliably infers prompt membership across various models.
Minor prompt changes lead to distinguishable output distribution differences.
The method demonstrates high statistical significance in prompt verification.
Abstract
Prompt engineering has emerged as a powerful technique for optimizing large language models (LLMs) for specific applications, enabling faster prototyping and improved performance, and giving rise to the interest of the community in protecting proprietary system prompts. In this work, we explore a novel perspective on prompt privacy through the lens of membership inference. We develop Prompt Detective, a statistical method to reliably determine whether a given system prompt was used by a third-party language model. Our approach relies on a statistical test comparing the distributions of two groups of model outputs corresponding to different system prompts. Through extensive experiments with a variety of language models, we demonstrate the effectiveness of Prompt Detective for prompt membership inference. Our work reveals that even minor changes in system prompts manifest in distinct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
