Cross-Modal Backdoors in Multimodal Large Language Models
Runhe Wang, Li Bai, Haibo Hu, Songze Li

TL;DR
This paper uncovers a novel cross-modal backdoor attack in multimodal large language models, exploiting lightweight connectors to compromise model security across different input modalities.
Contribution
It introduces a new attack method that poisons connectors to enable cross-modal backdoor activation, revealing a critical security vulnerability in MLLMs.
Findings
Achieves up to 99.9% attack success rate in same-modality settings.
Over 95% success rate in cross-modal settings under bounded perturbations.
Existing defenses are ineffective against this attack without utility loss.
Abstract
Developers increasingly construct multimodal large language models (MLLMs) by assembling pretrained components,introducing supply-chain attack surfaces.Existing security research primarily focuses on poisoning backbones such as encoders or large language models (LLMs),while the security risks of lightweight connectors remain unexplored.In this work,we propose a novel cross-modal backdoor attack that exploits this overlooked vulnerability.By poisoning only the connector using a single seed sample and several augmented variants from one modality,the adversary can subsequently activate the backdoor using inputs from other modalities.To achieve this,we first poison the connector to associate a compact latent region with a malicious target output.To activate the backdoor from other modalities,we further extract a malicious centroid from the poisoned latent representations and perform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
