Interpreting Attributions and Interactions of Adversarial Attacks
Xin Wang, Shuyun Lin, Hao Zhang, Yufei Zhu, Quanshi Zhang

TL;DR
This paper introduces a method to interpret adversarial attacks by attributing pixel contributions and interactions, revealing differences between adversarially-trained and normally-trained DNNs in how they respond to perturbations.
Contribution
It proposes a novel attribution and interaction analysis framework for adversarial perturbations using Shapley values, offering new insights into attack mechanisms.
Findings
Adversarially-trained DNNs have more perturbation components in the foreground.
Adversarially-trained DNNs have more components decreasing the true category score.
Decomposition reveals different perturbation structures between training methods.
Abstract
This paper aims to explain adversarial attacks in terms of how adversarial perturbations contribute to the attacking task. We estimate attributions of different image regions to the decrease of the attacking cost based on the Shapley value. We define and quantify interactions among adversarial perturbation pixels, and decompose the entire perturbation map into relatively independent perturbation components. The decomposition of the perturbation map shows that adversarially-trained DNNs have more perturbation components in the foreground than normally-trained DNNs. Moreover, compared to the normally-trained DNN, the adversarially-trained DNN have more components which mainly decrease the score of the true category. Above analyses provide new insights into the understanding of adversarial attacks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Bacillus and Francisella bacterial research
