Inspecting adversarial examples using the Fisher information

J\"org Martin; Clemens Elster

arXiv:1909.05527·cs.LG·September 13, 2019

Inspecting adversarial examples using the Fisher information

J\"org Martin, Clemens Elster

PDF

2 Repos

TL;DR

This paper explores using Fisher information to detect adversarial examples in neural networks, providing a visual analysis tool and demonstrating effectiveness on multiple datasets.

Contribution

It introduces Fisher information-based metrics for identifying adversarial inputs and visualizing neuron importance, advancing interpretability and robustness analysis.

Findings

01

Fisher information highlights neuron importance in adversarial detection

02

Methods scale well with network size

03

Effective on MNIST, CIFAR10, Fruits-360 datasets

Abstract

Adversarial examples are slight perturbations that are designed to fool artificial neural networks when fed as an input. In this work the usability of the Fisher information for the detection of such adversarial attacks is studied. We discuss various quantities whose computation scales well with the network size, study their behavior on adversarial examples and show how they can highlight the importance of single input neurons, thereby providing a visual tool for further analyzing (un-)reasonable behavior of a neural network. The potential of our methods is demonstrated by applications to the MNIST, CIFAR10 and Fruits-360 datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.