Defending Against Backdoor Attacks by Layer-wise Feature Analysis
Najeeb Moharram Jebreel, Josep Domingo-Ferrer, Yiming Li

TL;DR
This paper introduces a layer-wise feature analysis approach to detect and defend against backdoor attacks in deep neural networks by identifying a critical layer where feature differences are maximized.
Contribution
It reveals the existence of a critical layer for backdoor detection and proposes a simple filtering method based on feature differences at this layer.
Findings
Effective detection of poisoned samples demonstrated on benchmark datasets
Identification of a critical layer different from traditional choices
Significant reduction in backdoor success rate
Abstract
Training deep neural networks (DNNs) usually requires massive training data and computational resources. Users who cannot afford this may prefer to outsource training to a third party or resort to publicly available pre-trained models. Unfortunately, doing so facilitates a new training-time attack (i.e., backdoor attack) against DNNs. This attack aims to induce misclassification of input samples containing adversary-specified trigger patterns. In this paper, we first conduct a layer-wise feature analysis of poisoned and benign samples from the target class. We find out that the feature difference between benign and poisoned samples tends to be maximum at a critical layer, which is not always the one typically used in existing defenses, namely the layer before fully-connected layers. We also demonstrate how to locate this critical layer based on the behaviors of benign samples. We then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Digital Media Forensic Detection
