Backdoor Attacks on Self-Supervised Learning

Aniruddha Saha; Ajinkya Tejankar; Soroush Abbasi Koohpayegani; Hamed; Pirsiavash

arXiv:2105.10123·cs.CV·June 10, 2022

Backdoor Attacks on Self-Supervised Learning

Aniruddha Saha, Ajinkya Tejankar, Soroush Abbasi Koohpayegani, Hamed, Pirsiavash

PDF

Open Access 2 Repos

TL;DR

This paper demonstrates that self-supervised learning methods are vulnerable to backdoor attacks via data poisoning, and proposes a defense mechanism using knowledge distillation to mitigate such attacks.

Contribution

First to identify and analyze backdoor vulnerabilities in self-supervised learning, and to propose an effective defense strategy based on knowledge distillation.

Findings

01

Backdoor attacks can cause false positives in self-supervised models.

02

Poisoning large unlabeled datasets is practical and effective for attacks.

03

Knowledge distillation can neutralize backdoor effects.

Abstract

Large-scale unlabeled data has spurred recent progress in self-supervised learning methods that learn rich visual representations. State-of-the-art self-supervised methods for learning representations from images (e.g., MoCo, BYOL, MSF) use an inductive bias that random augmentations (e.g., random crops) of an image should produce similar embeddings. We show that such methods are vulnerable to backdoor attacks - where an attacker poisons a small part of the unlabeled data by adding a trigger (image patch chosen by the attacker) to the images. The model performance is good on clean test images, but the attacker can manipulate the decision of the model by showing the trigger at test time. Backdoor attacks have been studied extensively in supervised learning and to the best of our knowledge, we are the first to study them for self-supervised learning. Backdoor attacks are more practical in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning

MethodsBatch Normalization · InfoNCE · Bootstrap Your Own Latent · Momentum Contrast · Knowledge Distillation