DI-AA: An Interpretable White-box Attack for Fooling Deep Neural   Networks

Yixiang Wang; Jiqiang Liu; Xiaolin Chang; Jianhua Wang; Ricardo J.; Rodr\'iguez

arXiv:2110.07305·cs.LG·October 15, 2021

DI-AA: An Interpretable White-box Attack for Fooling Deep Neural Networks

Yixiang Wang, Jiqiang Liu, Xiaolin Chang, Jianhua Wang, Ricardo J., Rodr\'iguez

PDF

Open Access

TL;DR

DI-AA introduces an interpretable white-box adversarial attack leveraging deep Taylor decomposition, effectively reducing perturbations and breaking both non-robust and adversarially trained models, with notable success in transfer attacks.

Contribution

This paper presents DI-AA, a novel interpretable white-box attack method that utilizes deep Taylor decomposition for feature selection and optimization techniques to enhance attack effectiveness.

Findings

01

Achieves lower perturbation compared to baseline attacks.

02

Effectively breaks TRADES adversarially trained models.

03

Reduces robust accuracy of black-box models by 16-31%.

Abstract

White-box Adversarial Example (AE) attacks towards Deep Neural Networks (DNNs) have a more powerful destructive capacity than black-box AE attacks in the fields of AE strategies. However, almost all the white-box approaches lack interpretation from the point of view of DNNs. That is, adversaries did not investigate the attacks from the perspective of interpretable features, and few of these approaches considered what features the DNN actually learns. In this paper, we propose an interpretable white-box AE attack approach, DI-AA, which explores the application of the interpretable approach of the deep Taylor decomposition in the selection of the most contributing features and adopts the Lagrangian relaxation optimization of the logit output and L_p norm to further decrease the perturbation. We compare DI-AA with six baseline attacks (including the state-of-the-art attack AutoAttack) on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Machine Learning in Materials Science

MethodsAutoencoders