Exploring Layerwise Adversarial Robustness Through the Lens of t-SNE

In\^es Valentim; Nuno Antunes; Nuno Louren\c{c}o

arXiv:2406.14073·cs.LG·June 21, 2024

Exploring Layerwise Adversarial Robustness Through the Lens of t-SNE

In\^es Valentim, Nuno Antunes, Nuno Louren\c{c}o

PDF

Open Access

TL;DR

This paper introduces a novel method using t-SNE to visualize and quantify layerwise adversarial robustness in image classifiers, revealing early-layer vulnerabilities through a new metric and visual analysis.

Contribution

It proposes a t-SNE based metric for assessing adversarial robustness at different layers of neural networks, providing insights into early-layer vulnerabilities.

Findings

01

Weak spots appear early in feature extraction layers.

02

The metric correlates with visual t-SNE analysis.

03

Differences are consistent across different network architectures.

Abstract

Adversarial examples, designed to trick Artificial Neural Networks (ANNs) into producing wrong outputs, highlight vulnerabilities in these models. Exploring these weaknesses is crucial for developing defenses, and so, we propose a method to assess the adversarial robustness of image-classifying ANNs. The t-distributed Stochastic Neighbor Embedding (t-SNE) technique is used for visual inspection, and a metric, which compares the clean and perturbed embeddings, helps pinpoint weak spots in the layers. Analyzing two ANNs on CIFAR-10, one designed by humans and another via NeuroEvolution, we found that differences between clean and perturbed representations emerge early on, in the feature extraction layers, affecting subsequent classification. The findings with our metric are supported by the visual analysis of the t-SNE maps.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Physical Unclonable Functions (PUFs) and Hardware Security