Defending Against Backdoor Attacks by Layer-wise Feature Analysis

Najeeb Moharram Jebreel; Josep Domingo-Ferrer; Yiming Li

arXiv:2302.12758·cs.CR·February 27, 2023·1 cites

Defending Against Backdoor Attacks by Layer-wise Feature Analysis

Najeeb Moharram Jebreel, Josep Domingo-Ferrer, Yiming Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces a layer-wise feature analysis approach to detect and defend against backdoor attacks in deep neural networks by identifying a critical layer where feature differences are maximized.

Contribution

It reveals the existence of a critical layer for backdoor detection and proposes a simple filtering method based on feature differences at this layer.

Findings

01

Effective detection of poisoned samples demonstrated on benchmark datasets

02

Identification of a critical layer different from traditional choices

03

Significant reduction in backdoor success rate

Abstract

Training deep neural networks (DNNs) usually requires massive training data and computational resources. Users who cannot afford this may prefer to outsource training to a third party or resort to publicly available pre-trained models. Unfortunately, doing so facilitates a new training-time attack (i.e., backdoor attack) against DNNs. This attack aims to induce misclassification of input samples containing adversary-specified trigger patterns. In this paper, we first conduct a layer-wise feature analysis of poisoned and benign samples from the target class. We find out that the feature difference between benign and poisoned samples tends to be maximum at a critical layer, which is not always the one typically used in existing defenses, namely the layer before fully-connected layers. We also demonstrate how to locate this critical layer based on the behaviors of benign samples. We then…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

najeebjebreel/dbalfa
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Digital Media Forensic Detection