VLMShield: Efficient and Robust Defense of Vision-Language Models against Malicious Prompts

Peigui Qi; Kunsheng Tang; Yanpu Yu; Jialin Wu; Yide Song; Wenbo Zhou; Zhicong Huang; Cheng Hong; Weiming Zhang; Nenghai Yu

arXiv:2604.06502·cs.LG·April 9, 2026

VLMShield: Efficient and Robust Defense of Vision-Language Models against Malicious Prompts

Peigui Qi, Kunsheng Tang, Yanpu Yu, Jialin Wu, Yide Song, Wenbo Zhou, Zhicong Huang, Cheng Hong, Weiming Zhang, Nenghai Yu

PDF

1 Repo

TL;DR

VLMShield introduces a lightweight safety detector for vision-language models, enhancing robustness against malicious prompts through a novel feature extraction framework and empirical analysis.

Contribution

It proposes MAFE for better multimodal feature fusion and a new safety detector, VLMShield, to defend against malicious prompts efficiently and robustly.

Findings

01

VLMShield outperforms existing defenses in robustness and efficiency.

02

Distinct distributional patterns differentiate benign and malicious prompts.

03

Code implementation is publicly available at the provided GitHub URL.

Abstract

Vision-Language Models (VLMs) face significant safety vulnerabilities from malicious prompt attacks due to weakened alignment during visual integration. Existing defenses suffer from efficiency and robustness. To address these challenges, we first propose the Multimodal Aggregated Feature Extraction (MAFE) framework that enables CLIP to handle long text and fuse multimodal information into unified representations. Through empirical analysis of MAFE-extracted features, we discover distinct distributional patterns between benign and malicious prompts. Building upon this finding, we develop VLMShield, a lightweight safety detector that efficiently identifies multimodal malicious attacks as a plug-and-play solution. Extensive experiments demonstrate superior performance across multiple dimensions, including robustness, efficiency, and utility. Through our work, we hope to pave the way for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pgqihere/VLMShield
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.