Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities

Yuan Xiong; Ziqi Miao; Lijun Li; Chen Qian; Jie Li; Jing Shao

arXiv:2512.02973·cs.CV·December 3, 2025

Contextual Image Attack: How Visual Context Exposes Multimodal Safety Vulnerabilities

Yuan Xiong, Ziqi Miao, Lijun Li, Chen Qian, Jie Li, Jing Shao

PDF

Open Access

TL;DR

This paper introduces Contextual Image Attack (CIA), a novel image-centric method that exploits visual context to effectively jailbreak multimodal large language models, revealing vulnerabilities in their safety alignment.

Contribution

The paper presents a new image-focused attack approach using multi-agent systems and visualization strategies to embed harmful queries, surpassing prior text-image interaction methods.

Findings

01

CIA achieves high toxicity scores of 4.73 and 4.83 against GPT-4o and Qwen2.5-VL-72B.

02

Attack Success Rates reach 86.31% and 91.07%.

03

Outperforms previous methods in exposing safety vulnerabilities.

Abstract

While Multimodal Large Language Models (MLLMs) show remarkable capabilities, their safety alignments are susceptible to jailbreak attacks. Existing attack methods typically focus on text-image interplay, treating the visual modality as a secondary prompt. This approach underutilizes the unique potential of images to carry complex, contextual information. To address this gap, we propose a new image-centric attack method, Contextual Image Attack (CIA), which employs a multi-agent system to subtly embeds harmful queries into seemingly benign visual contexts using four distinct visualization strategies. To further enhance the attack's efficacy, the system incorporate contextual element enhancement and automatic toxicity obfuscation techniques. Experimental results on the MMSafetyBench-tiny dataset show that CIA achieves high toxicity scores of 4.73 and 4.83 against the GPT-4o and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Hate Speech and Cyberbullying Detection