Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks
Wenhan Yang, Jingdong Gao, Baharan Mirzasoleiman

TL;DR
This paper introduces SAFECLIP, a novel pre-training method that significantly enhances CLIP's robustness against targeted data poisoning and backdoor attacks by identifying and isolating risky data during training.
Contribution
SAFECLIP is a new defense approach that uses unimodal contrastive learning and Gaussian Mixture Models to prevent poisoning attacks during CLIP pre-training.
Findings
Reduces targeted poisoning attack success rate from 93.75% to 0%.
Eliminates backdoor attack success from up to 100% to 0%.
Maintains CLIP's original performance on benchmark datasets.
Abstract
Contrastive Language-Image Pre-training (CLIP) on large image-caption datasets has achieved remarkable success in zero-shot classification and enabled transferability to new domains. However, CLIP is extremely more vulnerable to targeted data poisoning and backdoor attacks, compared to supervised learning. Perhaps surprisingly, poisoning 0.0001% of CLIP pre-training data is enough to make targeted data poisoning attacks successful. This is four orders of magnitude smaller than what is required to poison supervised models. Despite this vulnerability, existing methods are very limited in defending CLIP models during pre-training. In this work, we propose a strong defense, SAFECLIP, to safely pre-train CLIP against targeted data poisoning and backdoor attacks. SAFECLIP warms up the model by applying unimodal contrastive learning (CL) on image and text modalities separately. Then, it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsContrastive Language-Image Pre-training · Contrastive Learning
