Activation Gradient based Poisoned Sample Detection Against Backdoor   Attacks

Danni Yuan; Shaokui Wei; Mingda Zhang; Li Liu; Baoyuan Wu

arXiv:2312.06230·cs.CR·May 29, 2024·1 cites

Activation Gradient based Poisoned Sample Detection Against Backdoor Attacks

Danni Yuan, Shaokui Wei, Mingda Zhang, Li Liu, Baoyuan Wu

PDF

Open Access 2 Repos

TL;DR

This paper introduces a novel detection method for identifying poisoned samples in backdoor attacks by analyzing the circular distribution of activation gradients, achieving superior results over existing techniques.

Contribution

The paper proposes the Activation Gradient based Poisoned Sample Detection (AGPD) method, leveraging the gradient circular distribution to effectively distinguish poisoned samples across various attack scenarios.

Findings

01

GCD of target class samples is more dispersed than clean class.

02

Poisoned and clean samples in the target class are clearly separated in GCD.

03

AGPD outperforms existing detection methods in diverse backdoor attack settings.

Abstract

This work studies the task of poisoned sample detection for defending against data poisoning based backdoor attacks. Its core challenge is finding a generalizable and discriminative metric to distinguish between clean and various types of poisoned samples (e.g., various triggers, various poisoning ratios). Inspired by a common phenomenon in backdoor attacks that the backdoored model tend to map significantly different poisoned and clean samples within the target class to similar activation areas, we introduce a novel perspective of the circular distribution of the gradients w.r.t. sample activation, dubbed gradient circular distribution (GCD). And, we find two interesting observations based on GCD. One is that the GCD of samples in the target class is much more dispersed than that in the clean class. The other is that in the GCD of target class, poisoned and clean samples are clearly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · COVID-19 diagnosis using AI