Visual Exclusivity Attacks: Automatic Multimodal Red Teaming via Agentic Planning

Yunbei Zhang; Yingqiang Ge; Weijie Xu; Yuhui Xu; Jihun Hamm; Chandan K. Reddy

arXiv:2603.20198·cs.CR·March 24, 2026

Visual Exclusivity Attacks: Automatic Multimodal Red Teaming via Agentic Planning

Yunbei Zhang, Yingqiang Ge, Weijie Xu, Yuhui Xu, Jihun Hamm, Chandan K. Reddy

PDF

Open Access

TL;DR

This paper introduces a novel multimodal attack framework called VE and MM-Plan that significantly improves the success rate of visual content-based attacks on advanced AI models, exposing safety vulnerabilities.

Contribution

It proposes a resilient visual threat model and a planning-based attack framework that outperforms existing methods in attacking large multimodal models.

Findings

01

MM-Plan achieves 46.3% success against Claude 4.5 Sonnet.

02

It attains 13.8% success against GPT-5.

03

The approach outperforms baselines by 2-5 times.

Abstract

Current multimodal red teaming treats images as wrappers for malicious payloads via typography or adversarial noise. These attacks are structurally brittle, as standard defenses neutralize them once the payload is exposed. We introduce Visual Exclusivity (VE), a more resilient Image-as-Basis threat where harm emerges only through reasoning over visual content such as technical schematics. To systematically exploit VE, we propose Multimodal Multi-turn Agentic Planning (MM-Plan), a framework that reframes jailbreaking from turn-by-turn reaction to global plan synthesis. MM-Plan trains an attacker planner to synthesize comprehensive, multi-turn strategies, optimized via Group Relative Policy Optimization (GRPO), enabling self-discovery of effective strategies without human supervision. To rigorously benchmark this reasoning-dependent threat, we introduce VE-Safety, a human-curated dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis