Chain-of-Lure: A Universal Jailbreak Attack Framework using Unconstrained Synthetic Narratives

Wenhan Chang; Tianqing Zhu; Yu Zhao; Shuangyong Song; Ping Xiong; Wanlei Zhou

arXiv:2505.17519·cs.CR·March 3, 2026

Chain-of-Lure: A Universal Jailbreak Attack Framework using Unconstrained Synthetic Narratives

Wenhan Chang, Tianqing Zhu, Yu Zhao, Shuangyong Song, Ping Xiong, Wanlei Zhou

PDF

Open Access

TL;DR

This paper presents a novel, unconstrained synthetic narrative-based jailbreak attack framework for large language models, demonstrating high success rates and toxicity, and proposing defense strategies for safer AI development.

Contribution

It introduces a new chain-of-lure attack method using multi-turn narrative optimization, revealing intrinsic vulnerabilities of LLMs and informing future alignment improvements.

Findings

01

High attack success rates across diverse LLMs

02

Elevated toxicity scores in generated outputs

03

Effective defense strategies proposed

Abstract

In the era of rapid generative AI development, interactions with large language models (LLMs) pose increasing risks of misuse. Prior research has primarily focused on attacks using template-based prompts and optimization-oriented methods, while overlooking the fact that LLMs possess strong unconstrained deceptive capabilities to attack other LLMs. This paper introduces a novel jailbreaking method inspired by the Chain-of-Thought mechanism. The attacker employs mission transfer to conceal harmful user intent within dialogue and generates a progressive chain of lure questions without relying on predefined templates, enabling successful jailbreaks. To further improve the attack's strength, we incorporate a helper LLM model that performs randomized narrative optimization over multi-turn interactions, enhancing the attack performance while preserving alignment with the original intent. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Hate Speech and Cyberbullying Detection · Adversarial Robustness in Machine Learning