BadActs: A Universal Backdoor Defense in the Activation Space

Biao Yi; Sishuo Chen; Yiming Li; Tong Li; Baolei Zhang; Zheli Liu

arXiv:2405.11227·cs.CR·May 21, 2024

BadActs: A Universal Backdoor Defense in the Activation Space

Biao Yi, Sishuo Chen, Yiming Li, Tong Li, Baolei Zhang, Zheli Liu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper proposes a universal backdoor defense method that purifies backdoor samples in the activation space, effectively counteracting diverse triggers while maintaining high accuracy on clean data.

Contribution

It introduces a novel activation space purification approach and a detection module, improving robustness against backdoor attacks across various trigger types.

Findings

01

Effective removal of backdoor triggers in activation space

02

Maintains high clean data accuracy

03

Outperforms existing defenses in diverse scenarios

Abstract

Backdoor attacks pose an increasingly severe security threat to Deep Neural Networks (DNNs) during their development stage. In response, backdoor sample purification has emerged as a promising defense mechanism, aiming to eliminate backdoor triggers while preserving the integrity of the clean content in the samples. However, existing approaches have been predominantly focused on the word space, which are ineffective against feature-space triggers and significantly impair performance on clean data. To address this, we introduce a universal backdoor defense that purifies backdoor samples in the activation space by drawing abnormal activations towards optimized minimum clean activation distribution intervals. The advantages of our approach are twofold: (1) By operating in the activation space, our method captures from surface-level information like words to higher-level semantic concepts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

clearloveclearlove/BadActs
pytorchOfficial

Videos

BadActs: A Universal Backdoor Defense in the Activation Space· underline

Taxonomy

TopicsSecurity and Verification in Computing