Reverse Engineering Imperceptible Backdoor Attacks on Deep Neural Networks for Detection and Training Set Cleansing
Zhen Xiang, David J. Miller, George Kesidis

TL;DR
This paper introduces an optimization-based method to detect, identify, and reverse engineer imperceptible backdoor patterns in training data, significantly improving defense against backdoor attacks on neural networks.
Contribution
It presents a novel reverse-engineering defense that detects poisoned training sets, identifies backdoor images and patterns, and enhances robustness against imperceptible backdoor attacks.
Findings
Achieves state-of-the-art detection accuracy on CIFAR-10.
Reduces attack success rate to below 5%.
Effectively identifies backdoor patterns and poisoned images.
Abstract
Backdoor data poisoning is an emerging form of adversarial attack usually against deep neural network image classifiers. The attacker poisons the training set with a relatively small set of images from one (or several) source class(es), embedded with a backdoor pattern and labeled to a target class. For a successful attack, during operation, the trained classifier will: 1) misclassify a test image from the source class(es) to the target class whenever the same backdoor pattern is present; 2) maintain a high classification accuracy for backdoor-free test images. In this paper, we make a break-through in defending backdoor attacks with imperceptible backdoor patterns (e.g. watermarks) before/during the training phase. This is a challenging problem because it is a priori unknown which subset (if any) of the training set has been poisoned. We propose an optimization-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Integrated Circuits and Semiconductor Failure Analysis · Anomaly Detection Techniques and Applications
