The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense

Yangyang Guo; Fangkai Jiao; Liqiang Nie; Mohan Kankanhalli

arXiv:2411.08410·cs.CR·March 7, 2025

The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense

Yangyang Guo, Fangkai Jiao, Liqiang Nie, Mohan Kankanhalli

PDF

Open Access

TL;DR

This paper investigates the paradox of high performance in both attacking and defending Vision Large Language Models (VLLMs), analyzing underlying causes, limitations of current defenses, and proposing a safety-aware detection method to improve trustworthiness.

Contribution

It offers a new explanation for VLLM jailbreak vulnerability, identifies the problem of over-prudence in defenses, and introduces a simple safety-aware detection pipeline.

Findings

01

VLLMs are vulnerable due to inclusion of vision inputs.

02

Current defenses suffer from over-prudence, causing unintended abstention.

03

Evaluation methods for jailbreak often show chance agreement.

Abstract

The vulnerability of Vision Large Language Models (VLLMs) to jailbreak attacks appears as no surprise. However, recent defense mechanisms against these attacks have reached near-saturation performance on benchmark evaluations, often with minimal effort. This \emph{dual high performance} in both attack and defense raises a fundamental and perplexing paradox. To gain a deep understanding of this issue and thus further help strengthen the trustworthiness of VLLMs, this paper makes three key contributions: i) One tentative explanation for VLLMs being prone to jailbreak attacks--\textbf{inclusion of vision inputs}, as well as its in-depth analysis. ii) The recognition of a largely ignored problem in existing defense mechanisms--\textbf{over-prudence}. The problem causes these defense methods to exhibit unintended abstention, even in the presence of benign inputs, thereby undermining their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation and Cyber Security · Safety Systems Engineering in Autonomy · Cybersecurity and Cyber Warfare Studies