Can the state of relevant neurons in a deep neural networks serve as indicators for detecting adversarial attacks?
Roger Granda, Tinne Tuytelaars, Jose Oramas

TL;DR
This paper proposes a neuron-based detection method for adversarial attacks in deep neural networks, monitoring relevant neurons' states to identify subtle input perturbations that compromise model integrity.
Contribution
It introduces a novel approach that inspects a sparse set of relevant neurons to detect adversarial attacks, demonstrating effectiveness comparable to existing methods.
Findings
Detects adversarial samples with accuracy similar to state-of-the-art detectors.
Identifies neurons whose states change in the presence of adversarial inputs.
Provides qualitative insights into neuron behavior under attack.
Abstract
We present a method for adversarial attack detection based on the inspection of a sparse set of neurons. We follow the hypothesis that adversarial attacks introduce imperceptible perturbations in the input and that these perturbations change the state of neurons relevant for the concepts modelled by the attacked model. Therefore, monitoring the status of these neurons would enable the detection of adversarial attacks. Focusing on the image classification task, our method identifies neurons that are relevant for the classes predicted by the model. A deeper qualitative inspection of these sparse set of neurons indicates that their state changes in the presence of adversarial samples. Moreover, quantitative results from our empirical evaluation indicate that our method is capable of recognizing adversarial samples, produced by state-of-the-art attack methods, with comparable accuracy to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Bacillus and Francisella bacterial research · Advanced Malware Detection Techniques
