SpaNN: Detecting Multiple Adversarial Patches on CNNs by Spanning Saliency Thresholds
Mauricio Byrd Victorica, Gy\"orgy D\'an, Henrik Sandberg

TL;DR
SpaNN is a novel attack detection method for CNNs that efficiently identifies multiple adversarial patches using saliency thresholds and clustering, outperforming existing defenses in object detection and classification tasks.
Contribution
It introduces SpaNN, a robust and computationally efficient detector for multiple adversarial patches that does not rely on fixed thresholds and is effective against white-box attacks.
Findings
SpaNN outperforms state-of-the-art defenses by up to 11 percentage points in object detection.
SpaNN achieves up to 27 percentage points improvement in image classification.
The method maintains efficiency regardless of the number of adversarial patches.
Abstract
State-of-the-art convolutional neural network models for object detection and image classification are vulnerable to physically realizable adversarial perturbations, such as patch attacks. Existing defenses have focused, implicitly or explicitly, on single-patch attacks, leaving their sensitivity to the number of patches as an open question or rendering them computationally infeasible or inefficient against attacks consisting of multiple patches in the worst cases. In this work, we propose SpaNN, an attack detector whose computational complexity is independent of the expected number of adversarial patches. The key novelty of the proposed detector is that it builds an ensemble of binarized feature maps by applying a set of saliency thresholds to the neural activations of the first convolutional layer of the victim model. It then performs clustering on the ensemble and uses the cluster…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
