DI-AA: An Interpretable White-box Attack for Fooling Deep Neural Networks
Yixiang Wang, Jiqiang Liu, Xiaolin Chang, Jianhua Wang, Ricardo J., Rodr\'iguez

TL;DR
DI-AA introduces an interpretable white-box adversarial attack leveraging deep Taylor decomposition, effectively reducing perturbations and breaking both non-robust and adversarially trained models, with notable success in transfer attacks.
Contribution
This paper presents DI-AA, a novel interpretable white-box attack method that utilizes deep Taylor decomposition for feature selection and optimization techniques to enhance attack effectiveness.
Findings
Achieves lower perturbation compared to baseline attacks.
Effectively breaks TRADES adversarially trained models.
Reduces robust accuracy of black-box models by 16-31%.
Abstract
White-box Adversarial Example (AE) attacks towards Deep Neural Networks (DNNs) have a more powerful destructive capacity than black-box AE attacks in the fields of AE strategies. However, almost all the white-box approaches lack interpretation from the point of view of DNNs. That is, adversaries did not investigate the attacks from the perspective of interpretable features, and few of these approaches considered what features the DNN actually learns. In this paper, we propose an interpretable white-box AE attack approach, DI-AA, which explores the application of the interpretable approach of the deep Taylor decomposition in the selection of the most contributing features and adopts the Lagrangian relaxation optimization of the logit output and L_p norm to further decrease the perturbation. We compare DI-AA with six baseline attacks (including the state-of-the-art attack AutoAttack) on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Machine Learning in Materials Science
MethodsAutoencoders
