Backdooring Vision-Language Models with Out-Of-Distribution Data

Weimin Lyu; Jiachen Yao; Saumya Gupta; Lu Pang; Tao Sun; Lingjie Yi,; Lijie Hu; Haibin Ling; Chao Chen

arXiv:2410.01264·cs.CV·March 3, 2025

Backdooring Vision-Language Models with Out-Of-Distribution Data

Weimin Lyu, Jiachen Yao, Saumya Gupta, Lu Pang, Tao Sun, Lingjie Yi,, Lijie Hu, Haibin Ling, Chao Chen

PDF

Open Access 3 Reviews

TL;DR

This paper demonstrates a novel method for backdooring vision-language models using only out-of-distribution data, exposing security vulnerabilities in complex image-to-text tasks without access to original training data.

Contribution

It introduces VLOOD, a new approach for backdooring VLMs with OOD data, effective in complex tasks and without needing original training data.

Findings

01

Successful backdoor attacks on VLMs in image captioning and VQA.

02

Minimal semantic degradation under poisoned inputs.

03

Reveals critical security vulnerabilities in VLMs.

Abstract

The emergence of Vision-Language Models (VLMs) represents a significant advancement in integrating computer vision with Large Language Models (LLMs) to generate detailed text descriptions from visual inputs. Despite their growing importance, the security of VLMs, particularly against backdoor attacks, is under explored. Moreover, prior works often assume attackers have access to the original training data, which is often unrealistic. In this paper, we address a more practical and challenging scenario where attackers must rely solely on Out-Of-Distribution (OOD) data. We introduce VLOOD (Backdooring Vision-Language Models with Out-of-Distribution Data), a novel approach with two key contributions: (1) demonstrating backdoor attacks on VLMs in complex image-to-text tasks while minimizing degradation of the original semantics under poisoned inputs, and (2) proposing innovative techniques…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. The experimental setting in this paper is practical and accessible. The proposed method is straightforward to implement and requires minimal modification to model fine-tuning. 2. Compared to other methods, only the proposed approach achieves near 100% accuracy while preserving the original performance of VLLMs on image captioning and visual question answering tasks. 3. The experiments are comprehensive, and results across three VLLMs and three datasets demonstrate the effectiveness of the

Weaknesses

Although the two proposed loss functions are straightforward, I am not very clear about their necessity in their current form. Specifically: - For the CKP loss, it has the same role as $\mathcal{L}_{LM(clean)}$ in general, which is to minimize the distance between the backdoor model's output distribution and the gold distribution on clean images. The only different is that for LM loss the gold distribution is the ground truth and for the CKP loss, the gold distribution is the distribution from

Reviewer 02Rating 5Confidence 4

Strengths

**Pros:** 1. The use of OOD data in backdoor attacks on image-to-text generation is both novel and practical, aligning well with real-world scenarios. 2. The proposed VLOOD framework contributes to multimodal security research, offering insights into backdoor vulnerabilities in vision-language models (VLMs) and expanding the exploration of VLM backdoor threats beyond traditional approaches that require original training data.

Weaknesses

**Main Concerns:** **1. Insufficient Justification of Loss Function Choices:** The paper introduces two main loss functions, CKP and CCP, to balance model performance on clean and poisoned inputs. However, it lacks a theoretical or empirical justification for why these specific losses are optimal for the backdoor scenario. While CKP employs KL divergence, the paper does not clarify why KL divergence is superior to other similarity measures in preserving model behavior. Likewise, for CCP, the ch

Reviewer 03Rating 8Confidence 3

Strengths

1. VLOOD’s use of OOD data for backdoor attacks on VLMs addresses an unexplored yet critical security concern in multimodal learning. 2. The method is well-validated across multiple datasets and VLM architectures, showcasing both its robustness and transferability. 3. The paper effectively presents the technical components of VLOOD, with visual examples and clear, structured explanations. 4. By revealing VLM vulnerabilities in realistic settings, VLOOD has substantial implications for the des

Weaknesses

1. Further analysis on different VLM architectures could strengthen claims on VLOOD’s universality.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies