Neural Attention Distillation: Erasing Backdoor Triggers from Deep   Neural Networks

Yige Li; Xixiang Lyu; Nodens Koren; Lingjuan Lyu; Bo Li; Xingjun Ma

arXiv:2101.05930·cs.LG·January 28, 2021·139 cites

Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks

Yige Li, Xixiang Lyu, Nodens Koren, Lingjuan Lyu, Bo Li, Xingjun Ma

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces Neural Attention Distillation (NAD), a novel method to effectively erase backdoor triggers from deep neural networks using minimal clean data, without harming model performance.

Contribution

NAD is a new defense framework that aligns student and teacher network attention to remove backdoors with limited clean data.

Findings

01

Effective against 6 state-of-the-art backdoor attacks

02

Removes backdoor triggers using only 5% clean data

03

Maintains performance on clean examples

Abstract

Deep neural networks (DNNs) are known vulnerable to backdoor attacks, a training time attack that injects a trigger pattern into a small proportion of training data so as to control the model's prediction at the test time. Backdoor attacks are notably dangerous since they do not affect the model's performance on clean examples, yet can fool the model to make incorrect prediction whenever the trigger pattern appears during testing. In this paper, we propose a novel defense framework Neural Attention Distillation (NAD) to erase backdoor triggers from backdoored DNNs. NAD utilizes a teacher network to guide the finetuning of the backdoored student network on a small clean subset of data such that the intermediate-layer attention of the student network aligns with that of the teacher network. The teacher network can be obtained by an independent finetuning process on the same clean subset.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bboylyg/NAD
pytorchOfficial

Videos

Neural Attention Distillation: Erasing Backdoor Triggers from Deep Neural Networks· slideslive

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Advanced Neural Network Applications