ICON: Intent-Context Coupling for Efficient Multi-Turn Jailbreak Attack
Xingwei Lin, Wenhao Lin, Sicong Cao, Jiahao Yu, Renke Huang, Lei Xue, Chunming Wu

TL;DR
This paper introduces ICON, a novel framework for multi-turn jailbreak attacks on LLMs that efficiently constructs adversarial contexts by leveraging intent-context coupling, significantly improving attack success rates.
Contribution
The paper proposes ICON, an automated multi-turn jailbreak method utilizing semantic routing and hierarchical optimization to enhance attack efficiency and effectiveness.
Findings
Achieves an average Attack Success Rate of 97.1% across eight SOTA LLMs.
Effectively constructs authoritative-style contexts to bypass safety constraints.
Demonstrates superior performance over existing methods in multi-turn jailbreak attacks.
Abstract
Multi-turn jailbreak attacks have emerged as a critical threat to Large Language Models (LLMs), bypassing safety mechanisms by progressively constructing adversarial contexts from scratch and incrementally refining prompts. However, existing methods suffer from the inefficiency of incremental context construction that requires step-by-step LLM interaction, and often stagnate in suboptimal regions due to surface-level optimization. In this paper, we characterize the Intent-Context Coupling phenomenon, revealing that LLM safety constraints are significantly relaxed when a malicious intent is coupled with a semantically congruent context pattern. Driven by this insight, we propose ICON, an automated multi-turn jailbreak framework that efficiently constructs an authoritative-style context via prior-guided semantic routing. Specifically, ICON first routes the malicious intent to a congruent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Graph Neural Networks · Topic Modeling
