Backdoor Mitigation by Correcting the Distribution of Neural Activations

Xi Li; Zhen Xiang; David J. Miller; George Kesidis

arXiv:2308.09850·cs.LG·August 22, 2023·2 cites

Backdoor Mitigation by Correcting the Distribution of Neural Activations

Xi Li, Zhen Xiang, David J. Miller, George Kesidis

PDF

Open Access

TL;DR

This paper identifies that backdoor attacks alter neural activation distributions and proposes a post-training mitigation method that corrects these distributions without retraining the model, improving detection and defense.

Contribution

The paper introduces a novel backdoor mitigation technique based on correcting activation distribution shifts, avoiding retraining and enhancing detection capabilities.

Findings

01

Effective backdoor mitigation without retraining.

02

Improved detection of trigger instances.

03

Outperforms existing methods in mitigation performance.

Abstract

Backdoor (Trojan) attacks are an important type of adversarial exploit against deep neural networks (DNNs), wherein a test instance is (mis)classified to the attacker's target class whenever the attacker's backdoor trigger is present. In this paper, we reveal and analyze an important property of backdoor attacks: a successful attack causes an alteration in the distribution of internal layer activations for backdoor-trigger instances, compared to that for clean instances. Even more importantly, we find that instances with the backdoor trigger will be correctly classified to their original source classes if this distribution alteration is corrected. Based on our observations, we propose an efficient and effective method that achieves post-training backdoor mitigation by correcting the distribution alteration using reverse-engineered triggers. Notably, our method does not change any…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning