MM-BD: Post-Training Detection of Backdoor Attacks with Arbitrary Backdoor Pattern Types Using a Maximum Margin Statistic
Hang Wang, Zhen Xiang, David J. Miller, George Kesidis

TL;DR
This paper introduces a novel post-training backdoor detection method that identifies arbitrary backdoor patterns in neural networks by analyzing maximum margin statistics, without requiring clean data or assumptions about the attack type.
Contribution
It proposes a universal detection technique based on maximum margin statistics that works against any backdoor embedding type without prior knowledge.
Findings
Effective detection on four datasets with various backdoor patterns
Outperforms several state-of-the-art methods
Supports detection of multiple source classes
Abstract
Backdoor attacks are an important type of adversarial threat against deep neural network classifiers, wherein test samples from one or more source classes will be (mis)classified to the attacker's target class when a backdoor pattern is embedded. In this paper, we focus on the post-training backdoor defense scenario commonly considered in the literature, where the defender aims to detect whether a trained classifier was backdoor-attacked without any access to the training set. Many post-training detectors are designed to detect attacks that use either one or a few specific backdoor embedding functions (e.g., patch-replacement or additive attacks). These detectors may fail when the backdoor embedding function used by the attacker (unknown to the defender) is different from the backdoor embedding function assumed by the defender. In contrast, we propose a post-training defense that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsSoftmax
