Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural   Networks for Detection and Training Set Cleansing

Zhen Xiang; David J. Miller; George Kesidis

arXiv:2010.07489·cs.LG·October 16, 2020

Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural Networks for Detection and Training Set Cleansing

Zhen Xiang, David J. Miller, George Kesidis

PDF

Open Access

TL;DR

This paper introduces an optimization-based method to detect, identify, and reverse engineer imperceptible backdoor patterns in training data, significantly improving defense against backdoor attacks on neural networks.

Contribution

It presents a novel reverse-engineering defense that detects poisoned training sets, identifies backdoor images and patterns, and enhances robustness against imperceptible backdoor attacks.

Findings

01

Achieves state-of-the-art detection accuracy on CIFAR-10.

02

Reduces attack success rate to below 5%.

03

Effectively identifies backdoor patterns and poisoned images.

Abstract

Backdoor data poisoning is an emerging form of adversarial attack usually against deep neural network image classifiers. The attacker poisons the training set with a relatively small set of images from one (or several) source class(es), embedded with a backdoor pattern and labeled to a target class. For a successful attack, during operation, the trained classifier will: 1) misclassify a test image from the source class(es) to the target class whenever the same backdoor pattern is present; 2) maintain a high classification accuracy for backdoor-free test images. In this paper, we make a break-through in defending backdoor attacks with imperceptible backdoor patterns (e.g. watermarks) before/during the training phase. This is a challenging problem because it is a priori unknown which subset (if any) of the training set has been poisoned. We propose an optimization-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Integrated Circuits and Semiconductor Failure Analysis · Anomaly Detection Techniques and Applications