Making MLLMs Blind: Adversarial Smuggling Attacks in MLLM Content Moderation

Zhiheng Li; Zongyang Ma; Yuntong Pan; Ziqi Zhang; Xiaolei Lv; Bo Li; Jun Gao; Jianing Zhang; Chunfeng Yuan; Bing Li; and Weiming Hu

arXiv:2604.06950·cs.CV·April 10, 2026

Making MLLMs Blind: Adversarial Smuggling Attacks in MLLM Content Moderation

Zhiheng Li, Zongyang Ma, Yuntong Pan, Ziqi Zhang, Xiaolei Lv, Bo Li, Jun Gao, Jianing Zhang, Chunfeng Yuan, Bing Li, and Weiming Hu

PDF

1 Repo 1 Datasets

TL;DR

This paper uncovers a new threat to multimodal large language models where harmful content is hidden in visual formats, evading detection, and introduces a benchmark to evaluate model vulnerabilities and mitigation strategies.

Contribution

It identifies adversarial smuggling attacks exploiting perception and reasoning gaps, and provides the first comprehensive benchmark, SmuggleBench, to evaluate model vulnerabilities.

Findings

01

State-of-the-art models are vulnerable with over 90% attack success rate.

02

Limited vision encoder capabilities and OCR robustness gaps are root causes.

03

Test-time scaling and adversarial training show potential mitigation benefits.

Abstract

Multimodal Large Language Models (MLLMs) are increasingly being deployed as automated content moderators. Within this landscape, we uncover a critical threat: Adversarial Smuggling Attacks. Unlike adversarial perturbations (for misclassification) and adversarial jailbreaks (for harmful output generation), adversarial smuggling exploits the Human-AI capability gap. It encodes harmful content into human-readable visual formats that remain AI-unreadable, thereby evading automated detection and enabling the dissemination of harmful content. We classify smuggling attacks into two pathways: (1) Perceptual Blindness, disrupting text recognition; and (2) Reasoning Blockade, inhibiting semantic understanding despite successful text recognition. To evaluate this threat, we constructed SmuggleBench, the first comprehensive benchmark comprising 1,700 adversarial smuggling attack instances.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhihengli-casia/smugglebench
github

Datasets

zhihengli-casia/smugglebench
dataset· 181 dl
181 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.