Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space

Yao Huang; Yitong Sun; Shouwei Ruan; Yichi Zhang; Yinpeng Dong; Xingxing Wei

arXiv:2505.21277·cs.CR·May 29, 2025

Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space

Yao Huang, Yitong Sun, Shouwei Ruan, Yichi Zhang, Yinpeng Dong, Xingxing Wei

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel framework that significantly enhances jailbreak attack success rates on large language models by expanding and optimizing the strategy space, revealing vulnerabilities previously considered secure.

Contribution

It proposes a new approach based on decomposing attack strategies and genetic optimization, enabling more effective black-box jailbreak attacks against safety-aligned models.

Findings

01

Achieves over 90% success rate on Claude-3.5

02

Outperforms prior methods in effectiveness

03

Demonstrates strong transferability across models

Abstract

Large Language Models (LLMs), despite advanced general capabilities, still suffer from numerous safety risks, especially jailbreak attacks that bypass safety protocols. Understanding these vulnerabilities through black-box jailbreak attacks, which better reflect real-world scenarios, offers critical insights into model robustness. While existing methods have shown improvements through various prompt engineering techniques, their success remains limited against safety-aligned models, overlooking a more fundamental problem: the effectiveness is inherently bounded by the predefined strategy spaces. However, expanding this space presents significant challenges in both systematically capturing essential attack patterns and efficiently navigating the increased complexity. To better explore the potential of expanding the strategy space, we address these challenges through a novel framework…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aries-iai/cl-gso
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCybercrime and Law Enforcement Studies