Activation Gradient based Poisoned Sample Detection Against Backdoor Attacks
Danni Yuan, Shaokui Wei, Mingda Zhang, Li Liu, Baoyuan Wu

TL;DR
This paper introduces a novel detection method for identifying poisoned samples in backdoor attacks by analyzing the circular distribution of activation gradients, achieving superior results over existing techniques.
Contribution
The paper proposes the Activation Gradient based Poisoned Sample Detection (AGPD) method, leveraging the gradient circular distribution to effectively distinguish poisoned samples across various attack scenarios.
Findings
GCD of target class samples is more dispersed than clean class.
Poisoned and clean samples in the target class are clearly separated in GCD.
AGPD outperforms existing detection methods in diverse backdoor attack settings.
Abstract
This work studies the task of poisoned sample detection for defending against data poisoning based backdoor attacks. Its core challenge is finding a generalizable and discriminative metric to distinguish between clean and various types of poisoned samples (e.g., various triggers, various poisoning ratios). Inspired by a common phenomenon in backdoor attacks that the backdoored model tend to map significantly different poisoned and clean samples within the target class to similar activation areas, we introduce a novel perspective of the circular distribution of the gradients w.r.t. sample activation, dubbed gradient circular distribution (GCD). And, we find two interesting observations based on GCD. One is that the GCD of samples in the target class is much more dispersed than that in the clean class. The other is that in the GCD of target class, poisoned and clean samples are clearly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · COVID-19 diagnosis using AI
