TL;DR
This paper introduces a novel adversarial sample detection method combining influence functions and k-nearest neighbors, achieving state-of-the-art results across multiple attack types and datasets.
Contribution
The method leverages influence functions and k-NN to effectively detect adversarial examples in any pre-trained neural network.
Findings
Achieves state-of-the-art detection accuracy on six attack methods
Effective across three different datasets
Utilizes influence functions and k-NN for robust detection
Abstract
Deep neural networks (DNNs) are notorious for their vulnerability to adversarial attacks, which are small perturbations added to their input images to mislead their prediction. Detection of adversarial examples is, therefore, a fundamental requirement for robust classification frameworks. In this work, we present a method for detecting such adversarial attacks, which is suitable for any pre-trained neural network classifier. We use influence functions to measure the impact of every training sample on the validation set data. From the influence scores, we find the most supportive training samples for any given validation example. A k-nearest neighbor (k-NN) model fitted on the DNN's activation layers is employed to search for the ranking of these supporting training samples. We observe that these samples are highly correlated with the nearest neighbors of the normal inputs, while this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors· youtube
Taxonomy
Methodsk-Nearest Neighbors
