# Interpretable BoW Networks for Adversarial Example Detection

**Authors:** Krishna Kanth Nakka, Mathieu Salzmann

arXiv: 1901.02229 · 2019-01-09

## TL;DR

This paper introduces interpretable Bag of visual Word networks using GANs for associating visual and semantic meanings to codewords, enabling effective detection of adversarial examples by comparing input images with highly activated codewords.

## Contribution

The paper presents a novel interpretable CNN approach using BoW representations and GANs, specifically designed for robust adversarial example detection.

## Key findings

- Outperforms state-of-the-art adversarial detection methods
- Provides visual and semantic interpretability of CNN predictions
- Effective across various attack strategies

## Abstract

The standard approach to providing interpretability to deep convolutional neural networks (CNNs) consists of visualizing either their feature maps, or the image regions that contribute the most to the prediction. In this paper, we introduce an alternative strategy to interpret the results of a CNN. To this end, we leverage a Bag of visual Word representation within the network and associate a visual and semantic meaning to the corresponding codebook elements via the use of a generative adversarial network. The reason behind the prediction for a new sample can then be interpreted by looking at the visual representation of the most highly activated codeword. We then propose to exploit our interpretable BoW networks for adversarial example detection. To this end, we build upon the intuition that, while adversarial samples look very similar to real images, to produce incorrect predictions, they should activate codewords with a significantly different visual representation. We therefore cast the adversarial example detection problem as that of comparing the input image with the most highly activated visual codeword. As evidenced by our experiments, this allows us to outperform the state-of-the-art adversarial example detection methods on standard benchmarks, independently of the attack strategy.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.02229/full.md

## Figures

43 figures with captions in the complete paper: https://tomesphere.com/paper/1901.02229/full.md

## References

57 references — full list in the complete paper: https://tomesphere.com/paper/1901.02229/full.md

---
Source: https://tomesphere.com/paper/1901.02229