Prompt Stealing Attacks Against Large Language Models

Zeyang Sha; Yang Zhang

arXiv:2402.12959·cs.CR·February 21, 2024·6 cites

Prompt Stealing Attacks Against Large Language Models

Zeyang Sha, Yang Zhang

PDF

Open Access

TL;DR

This paper introduces prompt stealing attacks against large language models, demonstrating how well-designed prompts can be extracted from generated answers, highlighting new security concerns in prompt engineering.

Contribution

The paper proposes a novel prompt stealing attack framework with modules for parameter extraction and prompt reconstruction, revealing vulnerabilities in LLM prompt security.

Findings

01

Effective prompt stealing demonstrated on LLMs

02

High accuracy in identifying prompt types and reconstructing prompts

03

Highlights security risks in prompt engineering practices

Abstract

The increasing reliance on large language models (LLMs) such as ChatGPT in various fields emphasizes the importance of ``prompt engineering,'' a technology to improve the quality of model outputs. With companies investing significantly in expert prompt engineers and educational resources rising to meet market demand, designing high-quality prompts has become an intriguing challenge. In this paper, we propose a novel attack against LLMs, named prompt stealing attacks. Our proposed prompt stealing attack aims to steal these well-designed prompts based on the generated answers. The prompt stealing attack contains two primary modules: the parameter extractor and the prompt reconstruction. The goal of the parameter extractor is to figure out the properties of the original prompts. We first observe that most prompts fall into one of three categories: direct prompt, role-based prompt, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Natural Language Processing Techniques