Robust Defense Strategies for Multimodal Contrastive Learning: Efficient Fine-tuning Against Backdoor Attacks
Md. Iqbal Hossain, Afia Sajeeda, Neeresh Kumar Perla, Ming Shao

TL;DR
This paper proposes a novel, efficient method to detect and mitigate backdoor attacks in multimodal contrastive models like CLIP, by identifying triggers and affected labels to fine-tune and restore model robustness.
Contribution
It introduces an innovative strategy that uses an image segmentation oracle and algorithms to identify backdoor triggers and affected labels, enabling targeted fine-tuning of poisoned CLIP models.
Findings
Effective detection of backdoor triggers and victim labels.
Successful rectification of poisoned CLIP models.
Improved robustness demonstrated on visual benchmarks.
Abstract
The advent of multimodal deep learning models, such as CLIP, has unlocked new frontiers in a wide range of applications, from image-text understanding to classification tasks. However, these models are not safe for adversarial attacks, particularly backdoor attacks, which can subtly manipulate model behavior. Moreover, existing defense methods typically involve training from scratch or fine-tuning using a large dataset without pinpointing the specific labels that are affected. In this study, we introduce an innovative strategy to enhance the robustness of multimodal contrastive learning models against such attacks. In particular, given a poisoned CLIP model, our approach can identify the backdoor trigger and pinpoint the victim samples and labels in an efficient manner. To that end, an image segmentation ``oracle'' is introduced as the supervisor for the output of the poisoned CLIP. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Advanced Graph Neural Networks
