Improved Activation Clipping for Universal Backdoor Mitigation and   Test-Time Detection

Hang Wang; Zhen Xiang; David J. Miller; George Kesidis

arXiv:2308.04617·cs.LG·August 10, 2023

Improved Activation Clipping for Universal Backdoor Mitigation and Test-Time Detection

Hang Wang, Zhen Xiang, David J. Miller, George Kesidis

PDF

Open Access 1 Repo

TL;DR

This paper introduces an improved activation clipping method to mitigate backdoor attacks in neural networks, enhancing robustness and enabling test-time detection with superior performance on CIFAR-10 and other datasets.

Contribution

It proposes a novel activation bounding approach that explicitly limits classification margins, improving backdoor mitigation and test-time detection over existing methods.

Findings

01

Superior backdoor mitigation performance on CIFAR-10

02

Robustness against adaptive and X2X attacks

03

Effective test-time detection and correction

Abstract

Deep neural networks are vulnerable to backdoor attacks (Trojans), where an attacker poisons the training set with backdoor triggers so that the neural network learns to classify test-time triggers to the attacker's designated target class. Recent work shows that backdoor poisoning induces over-fitting (abnormally large activations) in the attacked model, which motivates a general, post-training clipping method for backdoor mitigation, i.e., with bounds on internal-layer activations learned using a small set of clean samples. We devise a new such approach, choosing the activation bounds to explicitly limit classification margins. This method gives superior performance against peer methods for CIFAR-10 image classification. We also show that this method has strong robustness against adaptive attacks, X2X attacks, and on different datasets. Finally, we demonstrate a method extension for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wanghangpsu/mmac
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications