SRTJ: Self-Evolving Rule-Driven Training-Free LLM Jailbreaking

Jindong Li; Ying Liu; Yali Fu; Jinjing Zhu; Leyao Wang; Menglin Yang; Rex Ying

arXiv:2605.00974·cs.CR·May 5, 2026

SRTJ: Self-Evolving Rule-Driven Training-Free LLM Jailbreaking

Jindong Li, Ying Liu, Yali Fu, Jinjing Zhu, Leyao Wang, Menglin Yang, Rex Ying

PDF

1 Repo

TL;DR

SRTJ introduces a training-free, self-evolving framework for systematically discovering, composing, and refining jailbreak strategies against LLMs, leveraging feedback and rule organization to improve attack robustness and transferability.

Contribution

It presents a novel, training-free approach that combines experience-driven attack generation with ASP-based rule selection and hierarchical rule memory for effective jailbreaks.

Findings

01

Achieves strong attack performance across different LLMs.

02

Demonstrates improved robustness and generalization over existing methods.

03

Utilizes a hierarchical rule memory for effective strategy organization.

Abstract

LLMs are increasingly equipped with safety alignment mechanisms, yet recent studies demonstrate that they remain vulnerable to jailbreaking attacks that elicit harmful behaviors without explicit policy violations. While a growing body of work has explored automated jailbreak strategies, existing methods face several fundamental challenges, including the lack of systematic utilization of both successful and failed attack experiences, as well as the absence of principled mechanisms for composing and selecting reusable attack rules under diverse constraints. As a result, existing methods struggle to accumulate transferable knowledge over time and to reliably adapt attack strategies across different targets and evolving safety mechanisms. To address these issues, we propose a Self-Evolving Rule-Driven Training-Free Jailbreak (SRTJ) framework that systematically discovers, composes, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TheSolkatt/SRTJ
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.