Detecting and Diagnosing Adversarial Images with Class-Conditional Capsule Reconstructions
Yao Qin, Nicholas Frosst, Sara Sabour, Colin Raffel, Garrison Cottrell, and Geoffrey Hinton

TL;DR
This paper introduces a detection method for adversarial images using class-conditional capsule reconstructions, proposes a reconstructive attack to evaluate detection robustness, and finds CapsNets outperform CNNs in aligning features with human perception.
Contribution
The paper presents a novel detection approach for adversarial images based on capsule reconstructions and analyzes the effectiveness of a new reconstructive attack against these detectors.
Findings
CapsNets outperform CNNs in adversarial detection.
Reconstructive attack reduces success rate of undetected adversarial examples.
Perturbations can make adversarial images resemble target classes visually.
Abstract
Adversarial examples raise questions about whether neural network models are sensitive to the same visual features as humans. In this paper, we first detect adversarial examples or otherwise corrupted images based on a class-conditional reconstruction of the input. To specifically attack our detection mechanism, we propose the Reconstructive Attack which seeks both to cause a misclassification and a low reconstruction error. This reconstructive attack produces undetected adversarial examples but with much smaller success rate. Among all these attacks, we find that CapsNets always perform better than convolutional networks. Then, we diagnose the adversarial examples for CapsNets and find that the success of the reconstructive attack is highly related to the visual similarity between the source and target class. Additionally, the resulting perturbations can cause the input image to appear…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Integrated Circuits and Semiconductor Failure Analysis · Digital Media Forensic Detection
