Seeing the Threat: Vulnerabilities in Vision-Language Models to Adversarial Attack

Juan Ren; Mark Dras; Usman Naseem

arXiv:2505.21967·cs.CL·May 29, 2025

Seeing the Threat: Vulnerabilities in Vision-Language Models to Adversarial Attack

Juan Ren, Mark Dras, Usman Naseem

PDF

Open Access

TL;DR

This paper investigates security vulnerabilities in large vision-language models by analyzing their susceptibility to adversarial attacks, proposing a new evaluation framework, and defining normative safety standards for multimodal AI systems.

Contribution

It introduces a systematic analysis of adversarial vulnerabilities in LVLMs, a novel two-stage evaluation framework, and a normative schema for safety alignment in multimodal models.

Findings

01

Conventional adversarial attacks can bypass safety mechanisms in LVLMs.

02

The proposed framework effectively differentiates types of model responses to harmful prompts.

03

A normative schema provides a target for aligning model behavior with safety standards.

Abstract

Large Vision-Language Models (LVLMs) have shown remarkable capabilities across a wide range of multimodal tasks. However, their integration of visual inputs introduces expanded attack surfaces, thereby exposing them to novel security vulnerabilities. In this work, we conduct a systematic representational analysis to uncover why conventional adversarial attacks can circumvent the safety mechanisms embedded in LVLMs. We further propose a novel two stage evaluation framework for adversarial attacks on LVLMs. The first stage differentiates among instruction non compliance, outright refusal, and successful adversarial exploitation. The second stage quantifies the degree to which the model's output fulfills the harmful intent of the adversarial prompt, while categorizing refusal behavior into direct refusals, soft refusals, and partial refusals that remain inadvertently helpful. Finally, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Ethics and Social Impacts of AI

MethodsNetwork On Network