MM-BD: Post-Training Detection of Backdoor Attacks with Arbitrary   Backdoor Pattern Types Using a Maximum Margin Statistic

Hang Wang; Zhen Xiang; David J. Miller; George Kesidis

arXiv:2205.06900·cs.LG·August 8, 2023·1 cites

MM-BD: Post-Training Detection of Backdoor Attacks with Arbitrary Backdoor Pattern Types Using a Maximum Margin Statistic

Hang Wang, Zhen Xiang, David J. Miller, George Kesidis

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel post-training backdoor detection method that identifies arbitrary backdoor patterns in neural networks by analyzing maximum margin statistics, without requiring clean data or assumptions about the attack type.

Contribution

It proposes a universal detection technique based on maximum margin statistics that works against any backdoor embedding type without prior knowledge.

Findings

01

Effective detection on four datasets with various backdoor patterns

02

Outperforms several state-of-the-art methods

03

Supports detection of multiple source classes

Abstract

Backdoor attacks are an important type of adversarial threat against deep neural network classifiers, wherein test samples from one or more source classes will be (mis)classified to the attacker's target class when a backdoor pattern is embedded. In this paper, we focus on the post-training backdoor defense scenario commonly considered in the literature, where the defender aims to detect whether a trained classifier was backdoor-attacked without any access to the training set. Many post-training detectors are designed to detect attacks that use either one or a few specific backdoor embedding functions (e.g., patch-replacement or additive attacks). These detectors may fail when the backdoor embedding function used by the attacker (unknown to the defender) is different from the backdoor embedding function assumed by the defender. In contrast, we propose a post-training defense that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wanghangpsu/mm-bd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning

MethodsSoftmax