AJAR: Adaptive Jailbreak Architecture for Red-teaming

Yipu Dou; Wang Yang

arXiv:2601.10971·cs.CR·March 20, 2026

AJAR: Adaptive Jailbreak Architecture for Red-teaming

Yipu Dou, Wang Yang

PDF

Open Access

TL;DR

AJAR is a flexible framework that enhances the evaluation of LLM safety by enabling multi-turn jailbreak algorithms to be orchestrated as callable services within a tool-aware runtime, improving attack success rates and realism.

Contribution

It introduces AJAR, a novel framework that exposes multi-turn jailbreak algorithms as callable services, allowing more realistic and effective red-teaming of LLMs under agent-like conditions.

Findings

01

AJAR improves attack success rates on HarmBench behaviors.

02

AJAR achieves earlier success in multi-turn attack scenarios.

03

AJAR reproduces Crescendo more effectively than PyRIT.

Abstract

Large language model (LLM) safety evaluation is moving from content moderation to action security as modern systems gain persistent state, tool access, and autonomous control loops. Existing jailbreak frameworks still leave a gap between adaptive multi-turn attacks and agentic runtimes: attack algorithms are usually packaged as monolithic scripts, while agent harnesses rarely expose explicit abstractions for rollback, tool simulation, or strategy switching. We present AJAR, a red-teaming framework that exposes multi-turn jailbreak algorithms as callable MCP services and lets an Auditor Agent orchestrate them inside a tool-aware runtime built on Petri. AJAR integrates three representative attacks, namely Crescendo, ActorAttack, and X-Teaming, under a shared service interface for planning, prompt generation, optimization, evaluation, and context control. On 200 HarmBench validation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Adversarial Robustness in Machine Learning · Spam and Phishing Detection