Detecting Adversarial Samples Using Influence Functions and Nearest   Neighbors

Gilad Cohen; Guillermo Sapiro; Raja Giryes

arXiv:1909.06872·cs.LG·March 20, 2020

Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors

Gilad Cohen, Guillermo Sapiro, Raja Giryes

PDF

1 Repo 1 Video

TL;DR

This paper introduces a novel adversarial sample detection method combining influence functions and k-nearest neighbors, achieving state-of-the-art results across multiple attack types and datasets.

Contribution

The method leverages influence functions and k-NN to effectively detect adversarial examples in any pre-trained neural network.

Findings

01

Achieves state-of-the-art detection accuracy on six attack methods

02

Effective across three different datasets

03

Utilizes influence functions and k-NN for robust detection

Abstract

Deep neural networks (DNNs) are notorious for their vulnerability to adversarial attacks, which are small perturbations added to their input images to mislead their prediction. Detection of adversarial examples is, therefore, a fundamental requirement for robust classification frameworks. In this work, we present a method for detecting such adversarial attacks, which is suitable for any pre-trained neural network classifier. We use influence functions to measure the impact of every training sample on the validation set data. From the influence scores, we find the most supportive training samples for any given validation example. A k-nearest neighbor (k-NN) model fitted on the DNN's activation layers is employed to search for the ranking of these supporting training samples. We observe that these samples are highly correlated with the nearest neighbors of the normal inputs, while this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

giladcohen/NNIF_adv_defense
tfOfficial

Videos

Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors· youtube

Taxonomy

Methodsk-Nearest Neighbors