Exploring Layerwise Adversarial Robustness Through the Lens of t-SNE
In\^es Valentim, Nuno Antunes, Nuno Louren\c{c}o

TL;DR
This paper introduces a novel method using t-SNE to visualize and quantify layerwise adversarial robustness in image classifiers, revealing early-layer vulnerabilities through a new metric and visual analysis.
Contribution
It proposes a t-SNE based metric for assessing adversarial robustness at different layers of neural networks, providing insights into early-layer vulnerabilities.
Findings
Weak spots appear early in feature extraction layers.
The metric correlates with visual t-SNE analysis.
Differences are consistent across different network architectures.
Abstract
Adversarial examples, designed to trick Artificial Neural Networks (ANNs) into producing wrong outputs, highlight vulnerabilities in these models. Exploring these weaknesses is crucial for developing defenses, and so, we propose a method to assess the adversarial robustness of image-classifying ANNs. The t-distributed Stochastic Neighbor Embedding (t-SNE) technique is used for visual inspection, and a metric, which compares the clean and perturbed embeddings, helps pinpoint weak spots in the layers. Analyzing two ANNs on CIFAR-10, one designed by humans and another via NeuroEvolution, we found that differences between clean and perturbed representations emerge early on, in the feature extraction layers, affecting subsequent classification. The findings with our metric are supported by the visual analysis of the t-SNE maps.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Physical Unclonable Functions (PUFs) and Hardware Security
