Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey
Bhavuk Jain, Sercan \"O. Ar{\i}k, Hardeo K. Thakur

TL;DR
This survey systematically analyzes adversarial threats to multimodal large language models, offering a taxonomy and vulnerability analysis to guide the development of more robust systems.
Contribution
It introduces a comprehensive taxonomy and vulnerability-centric framework to understand and categorize adversarial attacks on MLLMs.
Findings
Unified attack surface taxonomy across modalities
Linking vulnerabilities to architectural weaknesses
Guidance for developing robust multimodal models
Abstract
Multimodal large language models (MLLMs) integrate information from multiple modalities such as text, images, audio, and video, enabling complex capabilities such as visual question answering and audio translation. While powerful, this increased expressiveness introduces new and amplified vulnerabilities to adversarial manipulation. This survey provides a comprehensive and systematic analysis of adversarial threats to MLLMs, moving beyond enumerating attack techniques to explain the underlying causes of model susceptibility. We introduce a taxonomy that organizes adversarial attacks according to attacker objectives, unifying diverse attack surfaces across modalities and deployment settings. Additionally, we also present a vulnerability-centric analysis that links integrity attacks, safety and jailbreak failures, control and instruction hijacking, and training-time poisoning to shared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
