Maatphor: Automated Variant Analysis for Prompt Injection Attacks

Ahmed Salem; Andrew Paverd; Boris K\"opf

arXiv:2312.11513·cs.CR·December 20, 2023·2 cites

Maatphor: Automated Variant Analysis for Prompt Injection Attacks

Ahmed Salem, Andrew Paverd, Boris K\"opf

PDF

Open Access

TL;DR

Maatphor is a tool that automates the generation and evaluation of prompt injection variants to improve defenses against evolving security threats in large language models.

Contribution

It introduces an automated method for generating and assessing prompt injection variants, aiding in defense development and dataset creation.

Findings

01

Generates effective prompt variants within 40 iterations

02

Achieves at least 60% effectiveness in tests

03

Assists in dataset creation for attack analysis

Abstract

Prompt injection has emerged as a serious security threat to large language models (LLMs). At present, the current best-practice for defending against newly-discovered prompt injection techniques is to add additional guardrails to the system (e.g., by updating the system prompt or using classifiers on the input and/or output of the model.) However, in the same way that variants of a piece of malware are created to evade anti-virus software, variants of a prompt injection can be created to evade the LLM's guardrails. Ideally, when a new prompt injection technique is discovered, candidate defenses should be tested not only against the successful prompt injection, but also against possible variants. In this work, we present, a tool to assist defenders in performing automated variant analysis of known prompt injection attacks. This involves solving two main challenges: (1) automatically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSoftware Engineering Research · Advanced Malware Detection Techniques · Web Application Security Vulnerabilities