BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models

Juncheng Li; Yige Li; Hanxun Huang; Yunhao Chen; Xin Wang; Yixu Wang; Xingjun Ma; Yu-Gang Jiang

arXiv:2511.18921·cs.CV·November 25, 2025

BackdoorVLM: A Benchmark for Backdoor Attacks on Vision-Language Models

Juncheng Li, Yige Li, Hanxun Huang, Yunhao Chen, Xin Wang, Yixu Wang, Xingjun Ma, Yu-Gang Jiang

PDF

Open Access

TL;DR

BackdoorVLM introduces a comprehensive benchmark for evaluating backdoor attacks on vision-language models, revealing their vulnerabilities and providing a foundation for future defense strategies in multimodal AI systems.

Contribution

This work is the first to systematically evaluate backdoor threats across vision-language models using a unified benchmark and diverse attack categories.

Findings

01

VLMs are highly sensitive to textual triggers.

02

Text triggers often dominate bimodal backdoor effects.

03

Low poisoning rates can achieve high attack success.

Abstract

Backdoor attacks undermine the reliability and trustworthiness of machine learning systems by injecting hidden behaviors that can be maliciously activated at inference time. While such threats have been extensively studied in unimodal settings, their impact on multimodal foundation models, particularly vision-language models (VLMs), remains largely underexplored. In this work, we introduce \textbf{BackdoorVLM}, the first comprehensive benchmark for systematically evaluating backdoor attacks on VLMs across a broad range of settings. It adopts a unified perspective that injects and analyzes backdoors across core vision-language tasks, including image captioning and visual question answering. BackdoorVLM organizes multimodal backdoor threats into 5 representative categories: targeted refusal, malicious injection, jailbreak, concept substitution, and perceptual hijack. Each category…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)