Exploring Jailbreak Attacks on LLMs through Intent Concealment and Diversion
Tiehan Cui, Yanxu Mao, Peipei Liu, Congying Liu, Datao You

TL;DR
This paper introduces ICE, a novel black-box jailbreak method using intent concealment and diversion, achieving high success rates with fewer queries, and presents BiSceneEval, a dataset for evaluating LLM robustness across tasks.
Contribution
The paper proposes ICE, a new efficient jailbreak technique, and BiSceneEval, a dataset for assessing LLM security in diverse text-generation scenarios.
Findings
ICE achieves high attack success rates with a single query.
ICE demonstrates improved transferability across different models.
Experimental results reveal vulnerabilities in current LLM defenses.
Abstract
Although large language models (LLMs) have achieved remarkable advancements, their security remains a pressing concern. One major threat is jailbreak attacks, where adversarial prompts bypass model safeguards to generate harmful or objectionable content. Researchers study jailbreak attacks to understand security and robustness of LLMs. However, existing jailbreak attack methods face two main challenges: (1) an excessive number of iterative queries, and (2) poor generalization across models. In addition, recent jailbreak evaluation datasets focus primarily on question-answering scenarios, lacking attention to text generation tasks that require accurate regeneration of toxic content. To tackle these challenges, we propose two contributions: (1) ICE, a novel black-box jailbreak method that employs Intent Concealment and divErsion to effectively circumvent security constraints. ICE achieves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Advanced Graph Neural Networks
MethodsSoftmax · Attention Is All You Need · Focus
