Compositional Jailbreaking: An Empirical Analysis of Mutator Chain Interactions in Aligned LLMs

Reinelle Jan Bugnot; Soohyeon Choi; Hoon Wei Lim; and Yue Duan

arXiv:2605.15598·cs.CR·May 18, 2026

Compositional Jailbreaking: An Empirical Analysis of Mutator Chain Interactions in Aligned LLMs

Reinelle Jan Bugnot, Soohyeon Choi, Hoon Wei Lim, and Yue Duan

PDF

TL;DR

This paper systematically analyzes how sequential combinations of simple jailbreak attacks interact in large language models, revealing complex behaviors that impact AI safety and robustness.

Contribution

It introduces a framework for mutator chaining, evaluates interactions across multiple models, and uncovers the non-uniform, often destructive, effects of combined attacks.

Findings

01

Most mutator combinations do not outperform individual attacks.

02

Synergistic effects are rare but can improve attack success.

03

Structural properties of safety alignment influence attack interactions.

Abstract

Jailbreaking attacks on large language models pose a significant threat to AI safety by enabling the generation of harmful or restricted content. While prior work has explored both handcrafted and automated jailbreak strategies, the potential for compositional interaction between simple attacks remains underexplored. This paper presents a systematic study of mutator chaining, in which weak jailbreak transformations are applied sequentially to characterize how they interact: whether they reinforce one another, interfere destructively, or produce no meaningful change. We implement twelve baseline mutators and evaluate all ordered pairs on a benchmark of harmful prompts against three popular LLM models. Our framework introduces metrics for completeness and validity that capture both transformation persistence and attack effectiveness. Results reveal that the interaction landscape is highly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.