Detecting Backdoor Samples in Contrastive Language Image Pretraining

Hanxun Huang; Sarah Erfani; Yige Li; Xingjun Ma; James Bailey

arXiv:2502.01385·cs.LG·February 11, 2025

Detecting Backdoor Samples in Contrastive Language Image Pretraining

Hanxun Huang, Sarah Erfani, Yige Li, Xingjun Ma, James Bailey

PDF

Open Access 1 Repo 10 Models 3 Reviews

TL;DR

This paper identifies unique local subspace characteristics of backdoor samples in CLIP models and proposes an efficient detection method using density ratio-based local outlier detectors, revealing existing backdoors in popular datasets.

Contribution

It introduces a novel detection approach for CLIP backdoor attacks based on local subspace sparsity, outperforming existing methods and uncovering unintentional backdoors in datasets.

Findings

01

Backdoor samples in CLIP exhibit sparse local neighborhoods.

02

Density ratio-based detectors effectively identify backdoor samples.

03

Unintentional backdoors exist in popular web datasets like CC3M.

Abstract

Contrastive language-image pretraining (CLIP) has been found to be vulnerable to poisoning backdoor attacks where the adversary can achieve an almost perfect attack success rate on CLIP models by poisoning only 0.01\% of the training dataset. This raises security concerns on the current practice of pretraining large-scale models on unscrutinized web data using CLIP. In this work, we analyze the representations of backdoor-poisoned samples learned by CLIP models and find that they exhibit unique characteristics in their local subspace, i.e., their local neighborhoods are far more sparse than that of clean samples. Based on this finding, we conduct a systematic study on detecting CLIP backdoor attacks and show that these attacks can be easily and efficiently detected by traditional density ratio-based local outlier detectors, whereas existing backdoor sample detection methods fail. Our…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

+ Proposed a backdoor data detection method for CLIP models + Experimental results show a high detection rate on the tested models.

Weaknesses

- The technical novelty is very limited as the main idea of this paper is simply applying existing outlier detection metrics on detecting CLIP backdoor samples. I didn’t see a major technical innovation in this process. - They only considered the simple and naive attack settings (directly poisoning the data with fixed noise), which seems reasonable to use those common outlier metrics to detect the possible poisoned data. It is not clear whether optimized trigger can be detected, for example [1

Reviewer 02Rating 6Confidence 2

Strengths

EFFICIENCY: The method can quickly detect backdoor samples in large-scale datasets. Using 4 Nvidia A100 GPUs, a network dataset of millions (e.g., CC3M) can be cleaned in 15 minutes, which is especially important for processing large-scale datasets. Accuracy: The proposed methods, especially the density-based local anomaly detection methods (e.g., SLOF and DAO), show high accuracy in detecting CLIP backdoor samples. These methods are able to effectively distinguish backdoor samples from normal

Weaknesses

Sensitivity to parameters: local anomaly detection methods (e.g., SLOF and DAO) rely on the choice of the localizability parameter k. Although the paper mentions that these methods are relatively robust to the value of k, improper parameter selection may still affect detection performance. Dataset dependency: the method performs well on the CC3M dataset, but its validity on other datasets may require further validation, as different datasets may have different characteristics and distributions.

Reviewer 03Rating 6Confidence 4

Strengths

1. The paper focuses on a compelling and timely topic in AI security. 2. A key discovery is that backdoor-poisoned samples in CLIP models exhibit distinctive characteristics in their local subspace, notably sparser local neighborhoods compared to clean samples. 3. The research reveals an intriguing and previously unknown unintentional backdoor in the widely-used CC3M dataset.

Weaknesses

1. The core defense strategy relies on the observation that backdoor examples exhibit sparser local neighborhoods compared to clean samples. This approach is particularly effective when the poisoning ratio is low, as the k-nearest neighbors of a backdoor example are likely to be clean samples. However, if my understanding is correct, as the poisoning ratio increases, the selection of the hyperparameter k becomes crucial. How to choose the hyperparameters of detection algorithms? 2. As claimed by

Code & Models

Repositories

HanxunH/Detect-CLIP-Backdoor-Samples
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsContrastive Language-Image Pre-training