From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking

Siyuan Wang; Zhuohan Long; Zhihao Fan; Zhongyu Wei

arXiv:2406.14859·cs.CL·June 24, 2024

From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking

Siyuan Wang, Zhuohan Long, Zhihao Fan, Zhongyu Wei

PDF

Open Access 1 Video

TL;DR

This paper reviews recent advancements in jailbreaking techniques for LLMs and MLLMs, emphasizing the need for more research in multimodal domain vulnerabilities to improve model robustness and security.

Contribution

It provides a comprehensive overview of current jailbreaking methods, evaluation benchmarks, and defense strategies for both LLMs and MLLMs, highlighting gaps in multimodal security research.

Findings

01

Multimodal jailbreaking research is less developed than unimodal.

02

Recent benchmarks and attack techniques have improved evaluation.

03

Identifies limitations and future directions for multimodal security.

Abstract

The rapid development of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has exposed vulnerabilities to various adversarial attacks. This paper provides a comprehensive overview of jailbreaking research targeting both LLMs and MLLMs, highlighting recent advancements in evaluation benchmarks, attack techniques and defense strategies. Compared to the more advanced state of unimodal jailbreaking, multimodal domain remains underexplored. We summarize the limitations and potential research directions of multimodal jailbreaking, aiming to inspire future research and further enhance the robustness and security of MLLMs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking· underline

Taxonomy

TopicsArtificial Intelligence in Law · Law in Society and Culture · Translation Studies and Practices