PolyJailbreak: Cross-Modal Jailbreaking Attacks on Black-Box Multimodal LLMs

Xinkai Wang; Beibei Li; Zerui Shao; Ao Liu; Guangquan Xu; Shouling Ji

arXiv:2510.17277·cs.CR·March 10, 2026

PolyJailbreak: Cross-Modal Jailbreaking Attacks on Black-Box Multimodal LLMs

Xinkai Wang, Beibei Li, Zerui Shao, Ao Liu, Guangquan Xu, Shouling Ji

PDF

Open Access

TL;DR

This paper reveals vulnerabilities in multimodal large language models (MLLMs) to black-box jailbreak attacks, introduces PolyJailbreak—a structured, multi-agent attack framework—and demonstrates its effectiveness across various models, exposing safety weaknesses.

Contribution

The paper presents PolyJailbreak, a novel structured framework for black-box multimodal jailbreak attacks, leveraging a library of primitives and reinforcement learning to improve attack success rates.

Findings

01

PolyJailbreak achieves over 95% success on commercial models.

02

It outperforms existing jailbreak methods by 18.15% on average.

03

Visual inputs can disrupt cross-modal safety constraints.

Abstract

Multimodal large language models (MLLMs) have become integral to a wide range of real-world applications by jointly reasoning over text and visual inputs. However, despite recent advances in safety alignment, MLLMs remain vulnerable to jailbreak attacks, where carefully crafted inputs can bypass safety mechanisms and elicit harmful responses. In this work, we investigate the security vulnerabilities of MLLMs in text-vision scenarios and propose a novel black-box jailbreak framework, named PolyJailbreak. We first identify a phenomenon, termed multimodal safety asymmetry, where visual alignment introduces uneven safety constraints across modalities and weakens overall robustness. We analyze attention dynamics and latent representations in MLLMs, revealing that visual inputs can disrupt cross-modal information flow and reduce the model's ability to separate benign and malicious intents.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Topic Modeling