Disrupting Adversarial Transferability in Deep Neural Networks
Christopher Wiedeman, Ge Wang

TL;DR
This paper investigates the reasons behind adversarial transferability in deep neural networks, revealing that high linear correlation in features contributes to transferability, and proposes a method to decorrelate features to reduce this effect.
Contribution
The paper introduces a feature correlation loss and a Dual Neck Autoencoder to create models with less transferable adversarial attacks by decorrelating learned features.
Findings
Feature correlation between models explains transferability.
Decorrelating features reduces attack transferability.
Dual Neck Autoencoder generates diverse encodings with lower transferability.
Abstract
Adversarial attack transferability is well-recognized in deep learning. Prior work has partially explained transferability by recognizing common adversarial subspaces and correlations between decision boundaries, but little is known beyond this. We propose that transferability between seemingly different models is due to a high linear correlation between the feature sets that different networks extract. In other words, two models trained on the same task that are distant in the parameter space likely extract features in the same fashion, just with trivial affine transformations between the latent spaces. Furthermore, we show how applying a feature correlation loss, which decorrelates the extracted features in a latent space, can reduce the transferability of adversarial attacks between models, suggesting that the models complete tasks in semantically different ways. Finally, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications
