Closer Look at the Transferability of Adversarial Examples: How They Fool Different Models Differently
Futa Waseda, Sosuke Nishikawa, Trung-Nghia Le, Huy H. Nguyen, and Isao, Echizen

TL;DR
This paper investigates how adversarial examples transfer between models, revealing that they often cause the same misclassification, but can also lead to different errors due to non-robust features, enhancing understanding of transferability mechanisms.
Contribution
It introduces a class-aware analysis of adversarial transferability, distinguishing same and different mistakes, and links these to non-robust features used differently by models.
Findings
Adversarial examples often cause same mistakes across models.
Different mistakes can occur even between similar models.
Non-robust features explain class-aware transferability.
Abstract
Deep neural networks are vulnerable to adversarial examples (AEs), which have adversarial transferability: AEs generated for the source model can mislead another (target) model's predictions. However, the transferability has not been understood in terms of to which class target model's predictions were misled (i.e., class-aware transferability). In this paper, we differentiate the cases in which a target model predicts the same wrong class as the source model ("same mistake") or a different wrong class ("different mistake") to analyze and provide an explanation of the mechanism. We find that (1) AEs tend to cause same mistakes, which correlates with "non-targeted transferability"; however, (2) different mistakes occur even between similar models, regardless of the perturbation size. Furthermore, we present evidence that the difference between same mistakes and different mistakes can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications
