Effective and Efficient Adversarial Detection for Vision-Language Models   via A Single Vector

Youcheng Huang; Fengbin Zhu; Jingkun Tang; Pan Zhou; Wenqiang Lei,; Jiancheng Lv; Tat-Seng Chua

arXiv:2410.22888·cs.CV·October 31, 2024

Effective and Efficient Adversarial Detection for Vision-Language Models via A Single Vector

Youcheng Huang, Fengbin Zhu, Jingkun Tang, Pan Zhou, Wenqiang Lei,, Jiancheng Lv, Tat-Seng Chua

PDF

Open Access 1 Repo

TL;DR

This paper introduces RADAR, a large-scale adversarial image dataset, and NEARSIDE, a novel detection method using a single embedding vector, to improve the safety of vision-language models against adversarial attacks.

Contribution

It presents a new large-scale adversarial dataset and a novel embedding-based detection method that is effective, efficient, and transferable across models.

Findings

01

NEARSIDE effectively detects adversarial images against LLaVA and MiniGPT-4.

02

The method demonstrates high efficiency and cross-model transferability.

03

RADAR provides a comprehensive dataset for adversarial attack research.

Abstract

Visual Language Models (VLMs) are vulnerable to adversarial attacks, especially those from adversarial images, which is however under-explored in literature. To facilitate research on this critical safety problem, we first construct a new laRge-scale Adervsarial images dataset with Diverse hArmful Responses (RADAR), given that existing datasets are either small-scale or only contain limited types of harmful responses. With the new RADAR dataset, we further develop a novel and effective iN-time Embedding-based AdveRSarial Image DEtection (NEARSIDE) method, which exploits a single vector that distilled from the hidden states of VLMs, which we call the attacking direction, to achieve the detection of adversarial images against benign ones in the input. Extensive experiments with two victim VLMs, LLaVA and MiniGPT-4, well demonstrate the effectiveness, efficiency, and cross-model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mob-scu/radar-nearside
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications