Zer0-Jack: A Memory-efficient Gradient-based Jailbreaking Method for Black-box Multi-modal Large Language Models

Tiejin Chen; Kaishen Wang; Hua Wei

arXiv:2411.07559·cs.LG·January 28, 2026

Zer0-Jack: A Memory-efficient Gradient-based Jailbreaking Method for Black-box Multi-modal Large Language Models

Tiejin Chen, Kaishen Wang, Hua Wei

PDF

Open Access

TL;DR

Zer0-Jack introduces a memory-efficient, gradient-based black-box attack method for multi-modal large language models, achieving high success rates without white-box access by using zeroth-order optimization and patch coordinate descent.

Contribution

The paper presents Zer0-Jack, a novel black-box attack technique that reduces memory usage and bypasses white-box requirements using zeroth-order optimization and patch coordinate descent.

Findings

01

Achieves 95% attack success on MiniGPT-4 in black-box setting.

02

Surpasses previous transfer attack methods in effectiveness.

03

Effective against commercial MLLMs like GPT-4o.

Abstract

Jailbreaking methods, which induce Multi-modal Large Language Models (MLLMs) to output harmful responses, raise significant safety concerns. Among these methods, gradient-based approaches, which use gradients to generate malicious prompts, have been widely studied due to their high success rates in white-box settings, where full access to the model is available. However, these methods have notable limitations: they require white-box access, which is not always feasible, and involve high memory usage. To address scenarios where white-box access is unavailable, attackers often resort to transfer attacks. In transfer attacks, malicious inputs generated using white-box models are applied to black-box models, but this typically results in reduced attack performance. To overcome these challenges, we propose Zer0-Jack, a method that bypasses the need for white-box access by leveraging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis