CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks

Xu Zhang; Hao Li; Zhichao Lu

arXiv:2510.17687·cs.CR·April 28, 2026

CrossGuard: Safeguarding MLLMs against Joint-Modal Implicit Malicious Attacks

Xu Zhang, Hao Li, Zhichao Lu

PDF

1 Repo

TL;DR

This paper introduces CrossGuard, a novel defense system for multimodal large language models that effectively detects and mitigates both explicit and implicit joint-modal malicious attacks, enhancing robustness in real-world scenarios.

Contribution

The paper presents ImpForge, an automated pipeline for generating implicit attack data, and CrossGuard, a new intent-aware safeguard that outperforms existing defenses against multimodal threats.

Findings

01

CrossGuard significantly outperforms existing defenses in various benchmarks.

02

ImpForge generates diverse implicit attack samples across 14 domains.

03

CrossGuard maintains high utility while providing robust security.

Abstract

Multimodal Large Language Models (MLLMs) achieve strong reasoning and perception capabilities but are increasingly vulnerable to jailbreak attacks. While existing work focuses on explicit attacks, where malicious content resides in a single modality, recent studies reveal implicit attacks, in which benign text and image inputs jointly express unsafe intent. Such joint-modal threats are difficult to detect and remain underexplored, largely due to the scarcity of high-quality implicit data. We propose ImpForge, an automated red-teaming pipeline that leverages reinforcement learning with tailored reward modules to generate diverse implicit samples across 14 domains. Building on this dataset, we further develop CrossGuard, an intent-aware safeguard providing robust and comprehensive defense against both explicit and implicit threats. Extensive experiments across safe and unsafe benchmarks,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZhangXu0963/CrossGuard
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.