BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP
Jiawang Bai, Kuofeng Gao, Shaobo Min, Shu-Tao Xia, Zhifeng Li, Wei Liu

TL;DR
BadCLIP introduces a novel backdoor attack on CLIP models by influencing both image and text encoders through trigger-aware prompts, achieving high success rates with minimal data and strong generalization across datasets.
Contribution
This work presents BadCLIP, the first prompt learning-based backdoor attack on CLIP that influences both encoders, effective with limited data, and demonstrates high success and generalization.
Findings
Attack success rate exceeds 99% in most cases.
Maintains similar clean accuracy to advanced prompt methods.
Effective across multiple datasets and unseen classes.
Abstract
Contrastive Vision-Language Pre-training, known as CLIP, has shown promising effectiveness in addressing downstream image recognition tasks. However, recent works revealed that the CLIP model can be implanted with a downstream-oriented backdoor. On downstream tasks, one victim model performs well on clean samples but predicts a specific target class whenever a specific trigger is present. For injecting a backdoor, existing attacks depend on a large amount of additional data to maliciously fine-tune the entire pre-trained CLIP model, which makes them inapplicable to data-limited scenarios. In this work, motivated by the recent success of learnable prompts, we address this problem by injecting a backdoor into the CLIP model in the prompt learning stage. Our method named BadCLIP is built on a novel and effective mechanism in backdoor attacks on CLIP, i.e., influencing both the image and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning
MethodsContrastive Language-Image Pre-training
