Many-Turn Jailbreaking
Xianjun Yang, Liqiang Xiao, Shiyang Li, Faisal Ladhak, Hyokun Yun, Linda Ruth Petzold, Yi Xu, William Yang Wang

TL;DR
This paper introduces the concept of multi-turn jailbreaking for large language models, highlighting a new safety threat where models can be manipulated over multiple interactions, and presents a benchmark to evaluate this vulnerability.
Contribution
It pioneers the exploration of multi-turn jailbreaking, constructs the MTJ-Bench benchmark, and provides initial insights into this emerging safety concern.
Findings
Multi-turn jailbreaking poses a significant safety threat.
The MTJ-Bench benchmark evaluates various models' vulnerabilities.
Initial results reveal models' susceptibility to multi-turn jailbreaking.
Abstract
Current jailbreaking work on large language models (LLMs) aims to elicit unsafe outputs from given prompts. However, it only focuses on single-turn jailbreaking targeting one specific query. On the contrary, the advanced LLMs are designed to handle extremely long contexts and can thus conduct multi-turn conversations. So, we propose exploring multi-turn jailbreaking, in which the jailbroken LLMs are continuously tested on more than the first-turn conversation or a single target query. This is an even more serious threat because 1) it is common for users to continue asking relevant follow-up questions to clarify certain jailbroken details, and 2) it is also possible that the initial round of jailbreaking causes the LLMs to respond to additional irrelevant questions consistently. As the first step (First draft done at June 2024) in exploring multi-turn jailbreaking, we construct a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Misinformation and Its Impacts
