How explainable are adversarially-robust CNNs?
Mehdi Nourelahi, Lars Kotthoff, Peijie Chen, Anh Nguyen

TL;DR
This study evaluates the relationships between accuracy, out-of-distribution performance, and explainability in CNNs, revealing that adversarially robust models tend to be more explainable with certain attribution methods, but no single model excels in all criteria.
Contribution
First large-scale analysis of how different CNN training methods affect accuracy, robustness, and explainability across multiple architectures and attribution techniques.
Findings
Adversarially robust CNNs have higher explainability scores with gradient-based methods.
AdvProp models are highly accurate but not more explainable.
GradCAM and RISE are the most consistently effective attribution methods.
Abstract
Three important criteria of existing convolutional neural networks (CNNs) are (1) test-set accuracy; (2) out-of-distribution accuracy; and (3) explainability. While these criteria have been studied independently, their relationship is unknown. For example, do CNNs that have a stronger out-of-distribution performance have also stronger explainability? Furthermore, most prior feature-importance studies only evaluate methods on 2-3 common vanilla ImageNet-trained CNNs, leaving it unknown how these methods generalize to CNNs of other architectures and training algorithms. Here, we perform the first, large-scale evaluation of the relations of the three criteria using 9 feature-importance methods and 12 ImageNet-trained CNNs that are of 3 training algorithms and 5 CNN architectures. We find several important insights and recommendations for ML practitioners. First, adversarially robust CNNs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Advanced Neural Network Applications
MethodsBatch Normalization · Auxiliary Batch Normalization · AdvProp
