Detecting Backdoor Attacks on Deep Neural Networks by Activation   Clustering

Bryant Chen; Wilka Carvalho; Nathalie Baracaldo; Heiko Ludwig,; Benjamin Edwards; Taesung Lee; Ian Molloy; and Biplav Srivastava

arXiv:1811.03728·cs.LG·November 12, 2018·231 cites

Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering

Bryant Chen, Wilka Carvalho, Nathalie Baracaldo, Heiko Ludwig,, Benjamin Edwards, Taesung Lee, Ian Molloy, and Biplav Srivastava

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel activation clustering method to detect and repair backdoors in neural networks, effectively identifying poisoned data and mitigating malicious behaviors without needing a trusted dataset.

Contribution

The paper presents the first methodology capable of detecting backdoor poisoning and repairing neural networks without relying on a verified trusted dataset.

Findings

01

Effective detection of backdoor attacks on neural networks.

02

Successful repair of models without trusted dataset requirements.

03

Demonstrated applicability on text and image classification tasks.

Abstract

While machine learning (ML) models are being increasingly trusted to make decisions in different and varying areas, the safety of systems using such models has become an increasing concern. In particular, ML models are often trained on data from potentially untrustworthy sources, providing adversaries with the opportunity to manipulate them by inserting carefully crafted samples into the training set. Recent work has shown that this type of attack, called a poisoning attack, allows adversaries to insert backdoors or trojans into the model, enabling malicious behavior with simple external backdoor triggers at inference time and only a blackbox perspective of the model itself. Detecting this type of attack is challenging because the unexpected behavior occurs only when a backdoor trigger, which is known only to the adversary, is present. Model users, either direct users of training data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhenxianglance/RE-paper
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Advanced Malware Detection Techniques