Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack
Mark Russinovich, Ahmed Salem, Ronen Eldan

TL;DR
This paper introduces Crescendo, a multi-turn jailbreak attack on LLMs that gradually escalates prompts to bypass alignment, demonstrating high success rates across various models and tasks, and includes an automated tool called Crescendomation.
Contribution
The paper presents Crescendo, a novel multi-turn jailbreak method that effectively bypasses model alignments, along with Crescendomation, an automation tool that outperforms existing techniques.
Findings
Crescendo achieves high success rates across multiple LLMs.
Crescendomation outperforms other jailbreak tools on AdvBench.
Crescendo can jailbreak multimodal models.
Abstract
Large Language Models (LLMs) have risen significantly in popularity and are increasingly being adopted across multiple applications. These LLMs are heavily aligned to resist engaging in illegal or unethical topics as a means to avoid contributing to responsible AI harms. However, a recent line of attacks, known as jailbreaks, seek to overcome this alignment. Intuitively, jailbreak attacks aim to narrow the gap between what the model can do and what it is willing to do. In this paper, we introduce a novel jailbreak attack called Crescendo. Unlike existing jailbreak methods, Crescendo is a simple multi-turn jailbreak that interacts with the model in a seemingly benign manner. It begins with a general prompt or question about the task at hand and then gradually escalates the dialogue by referencing the model's replies progressively leading to a successful jailbreak. We evaluate Crescendo…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLegal Systems and Judicial Processes · Criminal Law and Evidence · Law, AI, and Intellectual Property
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Softmax · Layer Normalization · Dropout · Dense Connections
