Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications
Junlin Wang, Tianyi Yang, Roy Xie, Bhuwan Dhingra

TL;DR
The paper introduces Raccoon, a comprehensive benchmark for evaluating the vulnerability of LLMs to prompt extraction attacks, including diverse attack types and defenses, to improve robustness assessment.
Contribution
It presents the first extensive benchmark evaluating LLM susceptibility to prompt theft, with novel dual-scenario assessment and a wide range of attack and defense strategies.
Findings
Models are generally vulnerable without defenses.
OpenAI models show resilience with proper defenses.
The benchmark covers 14 attack categories and defense mechanisms.
Abstract
With the proliferation of LLM-integrated applications such as GPT-s, millions are deployed, offering valuable services through proprietary instruction prompts. These systems, however, are prone to prompt extraction attacks through meticulously designed queries. To help mitigate this problem, we introduce the Raccoon benchmark which comprehensively evaluates a model's susceptibility to prompt extraction attacks. Our novel evaluation method assesses models under both defenseless and defended scenarios, employing a dual approach to evaluate the effectiveness of existing defenses and the resilience of the models. The benchmark encompasses 14 categories of prompt extraction attacks, with additional compounded attacks that closely mimic the strategies of potential attackers, alongside a diverse collection of defense templates. This array is, to our knowledge, the most extensive compilation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Advanced Data Processing Techniques · Neural Networks and Applications
