Examining the Proximity of Adversarial Examples to Class Manifolds in Deep Networks
\v{S}tefan P\'oco\v{s}, Iveta Be\v{c}kov\'a, Igor Farka\v{s}

TL;DR
This paper investigates how adversarial examples relate to class manifolds within deep neural networks, revealing that some adversarials remain close to correct class manifolds and are entangled with test data in hidden representations.
Contribution
It introduces two novel methods for measuring distances to class-specific manifolds and analyzes the inner representations of adversarial examples across different norm constraints.
Findings
Some adversarial examples stay near the correct class manifolds.
Adversarial examples are often entangled with test set activations.
Rubbish class inputs form a separate group in activation space.
Abstract
Deep neural networks achieve remarkable performance in multiple fields. However, after proper training they suffer from an inherent vulnerability against adversarial examples (AEs). In this work we shed light on inner representations of the AEs by analysing their activations on the hidden layers. We test various types of AEs, each crafted using a specific norm constraint, which affects their visual appearance and eventually their behavior in the trained networks. Our results in image classification tasks (MNIST and CIFAR-10) reveal qualitative differences between the individual types of AEs, when comparing their proximity to the class-specific manifolds on the inner representations. We propose two methods that can be used to compare the distances to class-specific manifolds, regardless of the changing dimensions throughout the network. Using these methods, we consistently confirm that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Integrated Circuits and Semiconductor Failure Analysis
