Trace and Detect Adversarial Attacks on CNNs using Feature Response Maps

Mohammadreza Amirian; Friedhelm Schwenker; Thilo Stadelmann

arXiv:2208.11436·cs.CV·August 25, 2022

Trace and Detect Adversarial Attacks on CNNs using Feature Response Maps

Mohammadreza Amirian, Friedhelm Schwenker, Thilo Stadelmann

PDF

TL;DR

This paper introduces a novel, human-interpretable method for detecting adversarial attacks on CNNs by analyzing feature response maps and using entropy measures, effective against state-of-the-art attacks on ImageNet.

Contribution

It proposes a new detection technique that tracks adversarial perturbations in feature responses without modifying CNN architecture, enhancing security against attacks.

Findings

01

Effective detection of adversarial examples on large-scale CNNs

02

Method is fully human-interpretable and does not alter network architecture

03

Validated against state-of-the-art attacks on ImageNet

Abstract

The existence of adversarial attacks on convolutional neural networks (CNN) questions the fitness of such models for serious applications. The attacks manipulate an input image such that misclassification is evoked while still looking normal to a human observer -- they are thus not easily detectable. In a different context, backpropagated activations of CNN hidden layers -- "feature responses" to a given input -- have been helpful to visualize for a human "debugger" what the CNN "looks at" while computing its output. In this work, we propose a novel detection method for adversarial examples to prevent attacks. We do so by tracking adversarial perturbations in feature responses, allowing for automatic detection using average local spatial entropy. The method does not alter the original network architecture and is fully human-interpretable. Experiments confirm the validity of our approach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.