Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs

Yunhao Chen; Xin Wang; Juncheng Li; Yixu Wang; Jie Li; Yan Teng; Yingchun Wang; Xingjun Ma

arXiv:2511.12710·cs.CL·May 19, 2026

Evolve the Method, Not the Prompts: Evolutionary Synthesis of Jailbreak Attacks on LLMs

Yunhao Chen, Xin Wang, Juncheng Li, Yixu Wang, Jie Li, Yan Teng, Yingchun Wang, Xingjun Ma

PDF

1 Repo

TL;DR

EvoSynth introduces an evolutionary framework that optimizes executable code for generating more effective and diverse jailbreak attacks on large language models, surpassing prompt-based methods.

Contribution

It shifts attack optimization from prompt space to code space, enabling autonomous evolution and self-correction of attack algorithms.

Findings

01

Achieves 85.5% attack success rate against robust models

02

Attains 95.9% average success rate across targets

03

Generates more diverse attacks than existing prompt-based methods

Abstract

Automated red teaming frameworks for Large Language Models (LLMs) have become increasingly sophisticated, yet many still formulate attack optimization primarily in the prompt space. In other words, these methods mainly search for better attack wording or better strategy choices, but they do not search over executable code. By moving the search into code space, we can optimize not only the final attack prompt, but also the procedure that generates it, including execution flow, reusable logic, branching, and failure-driven repair. To overcome this gap, we introduce EvoSynth, an autonomous multi-agent framework that shifts the optimization space from prompts to executable code. Instead of refining prompts directly, EvoSynth employs a multi-agent system to autonomously engineer, evolve, and execute code-based attack algorithms. Crucially, it features a code-level self-correction loop,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dongdongunique/EvoSynth
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Advanced Graph Neural Networks