Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey
Xuannan Liu, Xing Cui, Peipei Li, Zekun Li, Huaibo Huang, and Shuhan Xia, Miaoxuan Zhang, Yueying Zou, Ran He

TL;DR
This survey comprehensively reviews jailbreak attacks and defenses in multimodal generative models, analyzing attack methods, defense strategies, and evaluation frameworks across various modalities and system levels to ensure safer deployment.
Contribution
It provides a detailed taxonomy of attack and defense methods in multimodal models, covering multiple modalities and proposing future research directions.
Findings
Systematic exploration of attacks and defenses across four levels
Taxonomy of attack methods, defense mechanisms, and evaluation frameworks
Identification of current research challenges and future directions
Abstract
The rapid evolution of multimodal foundation models has led to significant advancements in cross-modal understanding and generation across diverse modalities, including text, images, audio, and video. However, these models remain susceptible to jailbreak attacks, which can bypass built-in safety mechanisms and induce the production of potentially harmful content. Consequently, understanding the methods of jailbreak attacks and existing defense mechanisms is essential to ensure the safe deployment of multimodal generative models in real-world scenarios, particularly in security-sensitive applications. To provide comprehensive insight into this topic, this survey reviews jailbreak and defense in multimodal generative models. First, given the generalized lifecycle of multimodal jailbreak, we systematically explore attacks and corresponding defense strategies across four levels: input,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital and Cyber Forensics · Hate Speech and Cyberbullying Detection · Advanced Malware Detection Techniques
