Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering
Bryant Chen, Wilka Carvalho, Nathalie Baracaldo, Heiko Ludwig,, Benjamin Edwards, Taesung Lee, Ian Molloy, and Biplav Srivastava

TL;DR
This paper introduces a novel activation clustering method to detect and repair backdoors in neural networks, effectively identifying poisoned data and mitigating malicious behaviors without needing a trusted dataset.
Contribution
The paper presents the first methodology capable of detecting backdoor poisoning and repairing neural networks without relying on a verified trusted dataset.
Findings
Effective detection of backdoor attacks on neural networks.
Successful repair of models without trusted dataset requirements.
Demonstrated applicability on text and image classification tasks.
Abstract
While machine learning (ML) models are being increasingly trusted to make decisions in different and varying areas, the safety of systems using such models has become an increasing concern. In particular, ML models are often trained on data from potentially untrustworthy sources, providing adversaries with the opportunity to manipulate them by inserting carefully crafted samples into the training set. Recent work has shown that this type of attack, called a poisoning attack, allows adversaries to insert backdoors or trojans into the model, enabling malicious behavior with simple external backdoor triggers at inference time and only a blackbox perspective of the model itself. Detecting this type of attack is challenging because the unexpected behavior occurs only when a backdoor trigger, which is known only to the adversary, is present. Model users, either direct users of training data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Advanced Malware Detection Techniques
