Visualizing Representations of Adversarially Perturbed Inputs

Daniel Steinberg; Paul Munro

arXiv:2105.14116·cs.LG·June 1, 2021

Visualizing Representations of Adversarially Perturbed Inputs

Daniel Steinberg, Paul Munro

PDF

Open Access 1 Repo

TL;DR

This paper introduces POP-N, a new metric for evaluating how well low-dimensional projections visualize neural network activations under adversarial attacks, aiding understanding of model vulnerabilities.

Contribution

The paper proposes the POP-N metric for assessing visualization quality of adversarially perturbed neural representations and demonstrates its application on CIFAR-10 data.

Findings

01

POP-2 scores vary across algorithms and attacks

02

High POP-2 scores enable effective 2D visualizations

03

The method helps interpret neural network vulnerabilities

Abstract

It has been shown that deep learning models are vulnerable to adversarial attacks. We seek to further understand the consequence of such attacks on the intermediate activations of neural networks. We present an evaluation metric, POP-N, which scores the effectiveness of projecting data to N dimensions under the context of visualizing representations of adversarially perturbed inputs. We conduct experiments on CIFAR-10 to compare the POP-2 score of several dimensionality reduction algorithms across various adversarial attacks. Finally, we utilize the 2D data corresponding to high POP-2 scores to generate example visualizations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dstein64/vrapi
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Machine Learning in Materials Science