Prompt Stealing Attacks Against Large Language Models
Zeyang Sha, Yang Zhang

TL;DR
This paper introduces prompt stealing attacks against large language models, demonstrating how well-designed prompts can be extracted from generated answers, highlighting new security concerns in prompt engineering.
Contribution
The paper proposes a novel prompt stealing attack framework with modules for parameter extraction and prompt reconstruction, revealing vulnerabilities in LLM prompt security.
Findings
Effective prompt stealing demonstrated on LLMs
High accuracy in identifying prompt types and reconstructing prompts
Highlights security risks in prompt engineering practices
Abstract
The increasing reliance on large language models (LLMs) such as ChatGPT in various fields emphasizes the importance of ``prompt engineering,'' a technology to improve the quality of model outputs. With companies investing significantly in expert prompt engineers and educational resources rising to meet market demand, designing high-quality prompts has become an intriguing challenge. In this paper, we propose a novel attack against LLMs, named prompt stealing attacks. Our proposed prompt stealing attack aims to steal these well-designed prompts based on the generated answers. The prompt stealing attack contains two primary modules: the parameter extractor and the prompt reconstruction. The goal of the parameter extractor is to figure out the properties of the original prompts. We first observe that most prompts fall into one of three categories: direct prompt, role-based prompt, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Natural Language Processing Techniques
