How explainable are adversarially-robust CNNs?

Mehdi Nourelahi; Lars Kotthoff; Peijie Chen; Anh Nguyen

arXiv:2205.13042·cs.CV·June 6, 2023·6 cites

How explainable are adversarially-robust CNNs?

Mehdi Nourelahi, Lars Kotthoff, Peijie Chen, Anh Nguyen

PDF

Open Access

TL;DR

This study evaluates the relationships between accuracy, out-of-distribution performance, and explainability in CNNs, revealing that adversarially robust models tend to be more explainable with certain attribution methods, but no single model excels in all criteria.

Contribution

First large-scale analysis of how different CNN training methods affect accuracy, robustness, and explainability across multiple architectures and attribution techniques.

Findings

01

Adversarially robust CNNs have higher explainability scores with gradient-based methods.

02

AdvProp models are highly accurate but not more explainable.

03

GradCAM and RISE are the most consistently effective attribution methods.

Abstract

Three important criteria of existing convolutional neural networks (CNNs) are (1) test-set accuracy; (2) out-of-distribution accuracy; and (3) explainability. While these criteria have been studied independently, their relationship is unknown. For example, do CNNs that have a stronger out-of-distribution performance have also stronger explainability? Furthermore, most prior feature-importance studies only evaluate methods on 2-3 common vanilla ImageNet-trained CNNs, leaving it unknown how these methods generalize to CNNs of other architectures and training algorithms. Here, we perform the first, large-scale evaluation of the relations of the three criteria using 9 feature-importance methods and 12 ImageNet-trained CNNs that are of 3 training algorithms and 5 CNN architectures. We find several important insights and recommendations for ML practitioners. First, adversarially robust CNNs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Advanced Neural Network Applications

MethodsBatch Normalization · Auxiliary Batch Normalization · AdvProp