Test-Time Multimodal Backdoor Detection by Contrastive Prompting
Yuwei Niu, Shuo He, Qi Wei, Zongyu Wu, Feng Liu, Lei Feng

TL;DR
This paper introduces BDetCLIP, a novel, efficient test-time method for detecting backdoored images in multimodal contrastive models like CLIP, using contrastive prompting and distribution differences in similarity scores.
Contribution
It is the first to propose a computationally efficient, inference-stage backdoor detection method for CLIP that leverages contrastive prompting and language models.
Findings
BDetCLIP outperforms existing methods in effectiveness.
It is more efficient with lower computational costs.
Successfully detects backdoored images in various scenarios.
Abstract
While multimodal contrastive learning methods (e.g., CLIP) can achieve impressive zero-shot classification performance, recent research has revealed that these methods are vulnerable to backdoor attacks. To defend against backdoor attacks on CLIP, existing defense methods focus on either the pre-training stage or the fine-tuning stage, which would unfortunately cause high computational costs due to numerous parameter updates and are not applicable in black-box settings. In this paper, we provide the first attempt at a computationally efficient backdoor detection method to defend against backdoored CLIP in the \emph{inference} stage. We empirically find that the visual representations of backdoored images are \emph{insensitive} to \emph{benign} and \emph{malignant} changes in class description texts. Motivated by this observation, we propose BDetCLIP, a novel test-time backdoor detection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis
MethodsFocus · Contrastive Learning · Contrastive Language-Image Pre-training
