Exploring Jailbreak Attacks on LLMs through Intent Concealment and Diversion

Tiehan Cui; Yanxu Mao; Peipei Liu; Congying Liu; Datao You

arXiv:2505.14316·cs.CR·May 21, 2025

Exploring Jailbreak Attacks on LLMs through Intent Concealment and Diversion

Tiehan Cui, Yanxu Mao, Peipei Liu, Congying Liu, Datao You

PDF

Open Access

TL;DR

This paper introduces ICE, a novel black-box jailbreak method using intent concealment and diversion, achieving high success rates with fewer queries, and presents BiSceneEval, a dataset for evaluating LLM robustness across tasks.

Contribution

The paper proposes ICE, a new efficient jailbreak technique, and BiSceneEval, a dataset for assessing LLM security in diverse text-generation scenarios.

Findings

01

ICE achieves high attack success rates with a single query.

02

ICE demonstrates improved transferability across different models.

03

Experimental results reveal vulnerabilities in current LLM defenses.

Abstract

Although large language models (LLMs) have achieved remarkable advancements, their security remains a pressing concern. One major threat is jailbreak attacks, where adversarial prompts bypass model safeguards to generate harmful or objectionable content. Researchers study jailbreak attacks to understand security and robustness of LLMs. However, existing jailbreak attack methods face two main challenges: (1) an excessive number of iterative queries, and (2) poor generalization across models. In addition, recent jailbreak evaluation datasets focus primarily on question-answering scenarios, lacking attention to text generation tasks that require accurate regeneration of toxic content. To tackle these challenges, we propose two contributions: (1) ICE, a novel black-box jailbreak method that employs Intent Concealment and divErsion to effectively circumvent security constraints. ICE achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Advanced Graph Neural Networks

MethodsSoftmax · Attention Is All You Need · Focus