Cross-Modal Backdoors in Multimodal Large Language Models

Runhe Wang; Li Bai; Haibo Hu; Songze Li

arXiv:2605.07490·cs.CR·May 11, 2026

Cross-Modal Backdoors in Multimodal Large Language Models

Runhe Wang, Li Bai, Haibo Hu, Songze Li

PDF

TL;DR

This paper uncovers a novel cross-modal backdoor attack in multimodal large language models, exploiting lightweight connectors to compromise model security across different input modalities.

Contribution

It introduces a new attack method that poisons connectors to enable cross-modal backdoor activation, revealing a critical security vulnerability in MLLMs.

Findings

01

Achieves up to 99.9% attack success rate in same-modality settings.

02

Over 95% success rate in cross-modal settings under bounded perturbations.

03

Existing defenses are ineffective against this attack without utility loss.

Abstract

Developers increasingly construct multimodal large language models (MLLMs) by assembling pretrained components,introducing supply-chain attack surfaces.Existing security research primarily focuses on poisoning backbones such as encoders or large language models (LLMs),while the security risks of lightweight connectors remain unexplored.In this work,we propose a novel cross-modal backdoor attack that exploits this overlooked vulnerability.By poisoning only the connector using a single seed sample and several augmented variants from one modality,the adversary can subsequently activate the backdoor using inputs from other modalities.To achieve this,we first poison the connector to associate a compact latent region with a malicious target output.To activate the backdoor from other modalities,we further extract a malicious centroid from the poisoned latent representations and perform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.