Jailbreak Attacks and Defenses against Multimodal Generative Models: A   Survey

Xuannan Liu; Xing Cui; Peipei Li; Zekun Li; Huaibo Huang; and Shuhan Xia; Miaoxuan Zhang; Yueying Zou; Ran He

arXiv:2411.09259·cs.CV·December 10, 2024·2 cites

Jailbreak Attacks and Defenses against Multimodal Generative Models: A Survey

Xuannan Liu, Xing Cui, Peipei Li, Zekun Li, Huaibo Huang, and Shuhan Xia, Miaoxuan Zhang, Yueying Zou, Ran He

PDF

Open Access 1 Repo

TL;DR

This survey comprehensively reviews jailbreak attacks and defenses in multimodal generative models, analyzing attack methods, defense strategies, and evaluation frameworks across various modalities and system levels to ensure safer deployment.

Contribution

It provides a detailed taxonomy of attack and defense methods in multimodal models, covering multiple modalities and proposing future research directions.

Findings

01

Systematic exploration of attacks and defenses across four levels

02

Taxonomy of attack methods, defense mechanisms, and evaluation frameworks

03

Identification of current research challenges and future directions

Abstract

The rapid evolution of multimodal foundation models has led to significant advancements in cross-modal understanding and generation across diverse modalities, including text, images, audio, and video. However, these models remain susceptible to jailbreak attacks, which can bypass built-in safety mechanisms and induce the production of potentially harmful content. Consequently, understanding the methods of jailbreak attacks and existing defense mechanisms is essential to ensure the safe deployment of multimodal generative models in real-world scenarios, particularly in security-sensitive applications. To provide comprehensive insight into this topic, this survey reviews jailbreak and defense in multimodal generative models. First, given the generalized lifecycle of multimodal jailbreak, we systematically explore attacks and corresponding defense strategies across four levels: input,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liuxuannan/awesome-multimodal-jailbreak
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital and Cyber Forensics · Hate Speech and Cyberbullying Detection · Advanced Malware Detection Techniques