EvoJail: Evolutionary Diverse Jailbreak Prompt Generation for Large Language Models

Rui Tang; Kaiyu Xu; Pengsen Cheng; Hao Ren; Haizhou Wang; Shuyu Jiang

arXiv:2605.02921·cs.NE·May 6, 2026

EvoJail: Evolutionary Diverse Jailbreak Prompt Generation for Large Language Models

Rui Tang, Kaiyu Xu, Pengsen Cheng, Hao Ren, Haizhou Wang, Shuyu Jiang

PDF

TL;DR

EvoJail is an evolutionary framework for generating diverse and adaptable jailbreak prompts for large language models, improving safety testing across model versions.

Contribution

It introduces a multi-objective evolutionary approach with instruction fusion and diversity-aware objectives to enhance jailbreak prompt diversity and adaptability.

Findings

01

Achieves over 93% attack success rate.

02

Improves diversity metrics by more than 5.6%.

03

Outperforms state-of-the-art methods in adaptability and diversity.

Abstract

As LLMs continue to shape real-world applications, automated jailbreak generation becomes essential to reveal safety weaknesses and guide model improvement. Existing automatic jailbreak generation methods have not yet fully considered two important aspects: adaptability to evolving safety-finetuned models, which affects their effectiveness on newer model versions, and diversity in generated prompts, which can cause narrow or repetitive attack patterns. To address these issues, we propose EvoJail, an instruction-fusion-driven evolutionary jailbreak generation framework that formalizes jailbreak prompt generation as a multi-objective black-box optimization problem and leverages the principles of evolutionary algorithms to search for jailbreak prompts that can adapt across different model versions and exhibit diverse attack patterns. Specifically, EvoJail integrates jailbreak prompt…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.