Disrupting Adversarial Transferability in Deep Neural Networks

Christopher Wiedeman; Ge Wang

arXiv:2108.12492·cs.LG·February 24, 2023

Disrupting Adversarial Transferability in Deep Neural Networks

Christopher Wiedeman, Ge Wang

PDF

Open Access 1 Repo

TL;DR

This paper investigates the reasons behind adversarial transferability in deep neural networks, revealing that high linear correlation in features contributes to transferability, and proposes a method to decorrelate features to reduce this effect.

Contribution

The paper introduces a feature correlation loss and a Dual Neck Autoencoder to create models with less transferable adversarial attacks by decorrelating learned features.

Findings

01

Feature correlation between models explains transferability.

02

Decorrelating features reduces attack transferability.

03

Dual Neck Autoencoder generates diverse encodings with lower transferability.

Abstract

Adversarial attack transferability is well-recognized in deep learning. Prior work has partially explained transferability by recognizing common adversarial subspaces and correlations between decision boundaries, but little is known beyond this. We propose that transferability between seemingly different models is due to a high linear correlation between the feature sets that different networks extract. In other words, two models trained on the same task that are distant in the parameter space likely extract features in the same fashion, just with trivial affine transformations between the latent spaces. Furthermore, we show how applying a feature correlation loss, which decorrelates the extracted features in a latent space, can reduce the transferability of adversarial attacks between models, suggesting that the models complete tasks in semantically different ways. Finally, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wang-axis/dna
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications