Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection
Hang Wang, Zhen Xiang, David J. Miller, George Kesidis

TL;DR
This paper introduces an improved activation clipping method to mitigate backdoor attacks in neural networks, enhancing robustness and enabling test-time detection with superior performance on CIFAR-10 and other datasets.
Contribution
It proposes a novel activation bounding approach that explicitly limits classification margins, improving backdoor mitigation and test-time detection over existing methods.
Findings
Superior backdoor mitigation performance on CIFAR-10
Robustness against adaptive and X2X attacks
Effective test-time detection and correction
Abstract
Deep neural networks are vulnerable to backdoor attacks (Trojans), where an attacker poisons the training set with backdoor triggers so that the neural network learns to classify test-time triggers to the attacker's designated target class. Recent work shows that backdoor poisoning induces over-fitting (abnormally large activations) in the attacked model, which motivates a general, post-training clipping method for backdoor mitigation, i.e., with bounds on internal-layer activations learned using a small set of clean samples. We devise a new such approach, choosing the activation bounds to explicitly limit classification margins. This method gives superior performance against peer methods for CIFAR-10 image classification. We also show that this method has strong robustness against adaptive attacks, X2X attacks, and on different datasets. Finally, we demonstrate a method extension for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
