Training set cleansing of backdoor poisoning by self-supervised   representation learning

H. Wang; S. Karami; O. Dia; H. Ritter; E. Emamjomeh-Zadeh; J. Chen; Z.; Xiang; D.J. Miller; G. Kesidis

arXiv:2210.10272·cs.LG·March 15, 2023

Training set cleansing of backdoor poisoning by self-supervised representation learning

H. Wang, S. Karami, O. Dia, H. Ritter, E. Emamjomeh-Zadeh, J. Chen, Z., Xiang, D.J. Miller, G. Kesidis

PDF

Open Access

TL;DR

This paper proposes a novel data cleansing method using self-supervised representation learning to effectively mitigate backdoor poisoning attacks in image classification, outperforming existing techniques.

Contribution

It introduces a self-supervised learning-based approach for backdoor defense that combines sample filtering and re-labeling, achieving state-of-the-art results.

Findings

01

Effective backdoor mitigation on CIFAR-10

02

Outperforms existing defense methods

03

Utilizes self-supervised embeddings to identify poisoned samples

Abstract

A backdoor or Trojan attack is an important type of data poisoning attack against deep neural network (DNN) classifiers, wherein the training dataset is poisoned with a small number of samples that each possess the backdoor pattern (usually a pattern that is either imperceptible or innocuous) and which are mislabeled to the attacker's target class. When trained on a backdoor-poisoned dataset, a DNN behaves normally on most benign test samples but makes incorrect predictions to the target class when the test sample has the backdoor pattern incorporated (i.e., contains a backdoor trigger). Here we focus on image classification tasks and show that supervised training may build stronger association between the backdoor pattern and the associated target class than that between normal features and the true class of origin. By contrast, self-supervised representation learning ignores the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications

MethodsTest