Jailbreaking Multimodal Large Language Models via Shuffle Inconsistency
Shiji Zhao, Ranjie Duan, Fengxiang Wang, Chi Chen, Caixin Kang, Shouwei Ruan, Jialing Tao, YueFeng Chen, Hui Xue, Xingxing Wei

TL;DR
This paper uncovers a shuffle inconsistency in multimodal large language models that allows for a novel jailbreak attack, SI-Attack, significantly increasing attack success rates on commercial models by exploiting their comprehension-safety discrepancy.
Contribution
The paper introduces a new understanding of shuffle inconsistency in MLLMs and proposes a black-box optimization based jailbreak method, SI-Attack, to effectively bypass safety mechanisms.
Findings
SI-Attack improves attack success rates on multiple benchmarks.
SI-Attack significantly enhances bypassing safety in commercial MLLMs like GPT-4o and Claude-3.5-Sonnet.
Shuffle inconsistency can be exploited to develop more effective jailbreak attacks.
Abstract
Multimodal Large Language Models (MLLMs) have achieved impressive performance and have been put into practical use in commercial applications, but they still have potential safety mechanism vulnerabilities. Jailbreak attacks are red teaming methods that aim to bypass safety mechanisms and discover MLLMs' potential risks. Existing MLLMs' jailbreak methods often bypass the model's safety mechanism through complex optimization methods or carefully designed image and text prompts. Despite achieving some progress, they have a low attack success rate on commercial closed-source MLLMs. Unlike previous research, we empirically find that there exists a Shuffle Inconsistency between MLLMs' comprehension ability and safety ability for the shuffled harmful instruction. That is, from the perspective of comprehension ability, MLLMs can understand the shuffled harmful text-image instructions well.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
