Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey

Bhavuk Jain; Sercan \"O. Ar{\i}k; Hardeo K. Thakur

arXiv:2603.27918·cs.CR·March 31, 2026

Adversarial Attacks on Multimodal Large Language Models: A Comprehensive Survey

Bhavuk Jain, Sercan \"O. Ar{\i}k, Hardeo K. Thakur

PDF

TL;DR

This survey systematically analyzes adversarial threats to multimodal large language models, offering a taxonomy and vulnerability analysis to guide the development of more robust systems.

Contribution

It introduces a comprehensive taxonomy and vulnerability-centric framework to understand and categorize adversarial attacks on MLLMs.

Findings

01

Unified attack surface taxonomy across modalities

02

Linking vulnerabilities to architectural weaknesses

03

Guidance for developing robust multimodal models

Abstract

Multimodal large language models (MLLMs) integrate information from multiple modalities such as text, images, audio, and video, enabling complex capabilities such as visual question answering and audio translation. While powerful, this increased expressiveness introduces new and amplified vulnerabilities to adversarial manipulation. This survey provides a comprehensive and systematic analysis of adversarial threats to MLLMs, moving beyond enumerating attack techniques to explain the underlying causes of model susceptibility. We introduce a taxonomy that organizes adversarial attacks according to attacker objectives, unifying diverse attack surfaces across modalities and deployment settings. Additionally, we also present a vulnerability-centric analysis that links integrity attacks, safety and jailbreak failures, control and instruction hijacking, and training-time poisoning to shared…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.