Making MLLMs Blind: Adversarial Smuggling Attacks in MLLM Content Moderation
Zhiheng Li, Zongyang Ma, Yuntong Pan, Ziqi Zhang, Xiaolei Lv, Bo Li, Jun Gao, Jianing Zhang, Chunfeng Yuan, Bing Li, and Weiming Hu

TL;DR
This paper uncovers a new threat to multimodal large language models where harmful content is hidden in visual formats, evading detection, and introduces a benchmark to evaluate model vulnerabilities and mitigation strategies.
Contribution
It identifies adversarial smuggling attacks exploiting perception and reasoning gaps, and provides the first comprehensive benchmark, SmuggleBench, to evaluate model vulnerabilities.
Findings
State-of-the-art models are vulnerable with over 90% attack success rate.
Limited vision encoder capabilities and OCR robustness gaps are root causes.
Test-time scaling and adversarial training show potential mitigation benefits.
Abstract
Multimodal Large Language Models (MLLMs) are increasingly being deployed as automated content moderators. Within this landscape, we uncover a critical threat: Adversarial Smuggling Attacks. Unlike adversarial perturbations (for misclassification) and adversarial jailbreaks (for harmful output generation), adversarial smuggling exploits the Human-AI capability gap. It encodes harmful content into human-readable visual formats that remain AI-unreadable, thereby evading automated detection and enabling the dissemination of harmful content. We classify smuggling attacks into two pathways: (1) Perceptual Blindness, disrupting text recognition; and (2) Reasoning Blockade, inhibiting semantic understanding despite successful text recognition. To evaluate this threat, we constructed SmuggleBench, the first comprehensive benchmark comprising 1,700 adversarial smuggling attack instances.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
