From LLMs to MLLMs: Exploring the Landscape of Multimodal Jailbreaking
Siyuan Wang, Zhuohan Long, Zhihao Fan, Zhongyu Wei

TL;DR
This paper reviews recent advancements in jailbreaking techniques for LLMs and MLLMs, emphasizing the need for more research in multimodal domain vulnerabilities to improve model robustness and security.
Contribution
It provides a comprehensive overview of current jailbreaking methods, evaluation benchmarks, and defense strategies for both LLMs and MLLMs, highlighting gaps in multimodal security research.
Findings
Multimodal jailbreaking research is less developed than unimodal.
Recent benchmarks and attack techniques have improved evaluation.
Identifies limitations and future directions for multimodal security.
Abstract
The rapid development of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has exposed vulnerabilities to various adversarial attacks. This paper provides a comprehensive overview of jailbreaking research targeting both LLMs and MLLMs, highlighting recent advancements in evaluation benchmarks, attack techniques and defense strategies. Compared to the more advanced state of unimodal jailbreaking, multimodal domain remains underexplored. We summarize the limitations and potential research directions of multimodal jailbreaking, aiming to inspire future research and further enhance the robustness and security of MLLMs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsArtificial Intelligence in Law · Law in Society and Culture · Translation Studies and Practices
