# Fooling Network Interpretation in Image Classification

**Authors:** Akshayvarun Subramanya, Vipin Pillai, Hamed Pirsiavash

arXiv: 1812.02843 · 2019-09-26

## TL;DR

This paper demonstrates that adversarial patches can be crafted to fool both neural network predictions and their interpretation algorithms, revealing vulnerabilities in current explanation methods and proposing a new evaluation framework.

## Contribution

It introduces adversarial patches that deceive both classification and interpretation, and provides a controlled setting to assess interpretation algorithm robustness.

## Key findings

- Adversarial patches can mislead interpretation algorithms like Grad-CAM.
- Proposed attack method effectively alters network explanations.
- Framework enables evaluation of interpretation robustness.

## Abstract

Deep neural networks have been shown to be fooled rather easily using adversarial attack algorithms. Practical methods such as adversarial patches have been shown to be extremely effective in causing misclassification. However, these patches are highlighted using standard network interpretation algorithms, thus revealing the identity of the adversary. We show that it is possible to create adversarial patches which not only fool the prediction, but also change what we interpret regarding the cause of the prediction. Moreover, we introduce our attack as a controlled setting to measure the accuracy of interpretation algorithms. We show this using extensive experiments for Grad-CAM interpretation that transfers to occluding patch interpretation as well. We believe our algorithms can facilitate developing more robust network interpretation tools that truly explain the network's underlying decision making process.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1812.02843/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/1812.02843/full.md

## References

35 references — full list in the complete paper: https://tomesphere.com/paper/1812.02843/full.md

---
Source: https://tomesphere.com/paper/1812.02843