Break the Brake, Not the Wheel: Untargeted Jailbreak via Entropy Maximization

Mengqi He; Xinyu Tian; Xin Shen; Shu Zou; Jinhong Ni; Zhaoyuan Yang; Weikang Li; Xuesong Li; Jing Zhang

arXiv:2605.10764·cs.CV·May 12, 2026

Break the Brake, Not the Wheel: Untargeted Jailbreak via Entropy Maximization

Mengqi He, Xinyu Tian, Xin Shen, Shu Zou, Jinhong Ni, Zhaoyuan Yang, Weikang Li, Xuesong Li, Jing Zhang

PDF

TL;DR

This paper introduces UJEM-KL, a lightweight untargeted attack that maximizes entropy at decision tokens to improve transferability of jailbreaks on vision-language models, challenging prior assumptions about transferability limitations.

Contribution

The paper proposes a novel entropy maximization attack method that enhances transferability of untargeted jailbreaks on VLMs, with comprehensive evaluation across models and benchmarks.

Findings

01

UJEM-KL achieves high success rates in white-box attacks.

02

The method improves transferability across different models.

03

Transferability limitations are mainly due to constrained optimization objectives.

Abstract

Recent studies show that gradient-based universal image jailbreaks on vision-language models (VLMs) exhibit little or no cross-model transferability, casting doubt on the feasibility of transferable multimodal jailbreaks. We revisit this conclusion under a strictly untargeted threat model without enforcing a fixed prefix or response pattern. Our preliminary experiment reveals that refusal behavior concentrates at high-entropy tokens during autoregressive decoding, and non-refusal tokens already carry substantial probability mass among the top-ranked candidates before attack. Motivated by this finding, we propose Untargeted Jailbreak via Entropy Maximization(UJEM)-KL, a lightweight attack that maximizes entropy at these decision tokens to flip refusal outcomes, while stabilizing the remaining low-entropy positions to preserve output quality. Across three VLMs and two safety benchmarks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.