Backdoor Attacks on Self-Supervised Learning
Aniruddha Saha, Ajinkya Tejankar, Soroush Abbasi Koohpayegani, Hamed, Pirsiavash

TL;DR
This paper demonstrates that self-supervised learning methods are vulnerable to backdoor attacks via data poisoning, and proposes a defense mechanism using knowledge distillation to mitigate such attacks.
Contribution
First to identify and analyze backdoor vulnerabilities in self-supervised learning, and to propose an effective defense strategy based on knowledge distillation.
Findings
Backdoor attacks can cause false positives in self-supervised models.
Poisoning large unlabeled datasets is practical and effective for attacks.
Knowledge distillation can neutralize backdoor effects.
Abstract
Large-scale unlabeled data has spurred recent progress in self-supervised learning methods that learn rich visual representations. State-of-the-art self-supervised methods for learning representations from images (e.g., MoCo, BYOL, MSF) use an inductive bias that random augmentations (e.g., random crops) of an image should produce similar embeddings. We show that such methods are vulnerable to backdoor attacks - where an attacker poisons a small part of the unlabeled data by adding a trigger (image patch chosen by the attacker) to the images. The model performance is good on clean test images, but the attacker can manipulate the decision of the model by showing the trigger at test time. Backdoor attacks have been studied extensively in supervised learning and to the best of our knowledge, we are the first to study them for self-supervised learning. Backdoor attacks are more practical in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
MethodsBatch Normalization · InfoNCE · Bootstrap Your Own Latent · Momentum Contrast · Knowledge Distillation
