Assessing the Reliability of Visual Explanations of Deep Models with Adversarial Perturbations
Dan Valle, Tiago Pimentel, Adriano Veloso

TL;DR
This paper introduces an objective measure to evaluate the reliability of visual explanations of deep neural networks by analyzing the impact of adversarial perturbations on model outputs, improving interpretability assessment.
Contribution
It proposes a novel, objective evaluation method for explanation maps based on adversarial input perturbations, and demonstrates its application in refining relevance maps for better interpretability.
Findings
The proposed measure effectively differentiates explanation methods based on robustness.
Adversarial perturbations reveal the reliability of pixel importance maps.
Refined relevance maps maintain interpretability without losing essential information.
Abstract
The interest in complex deep neural networks for computer vision applications is increasing. This leads to the need for improving the interpretable capabilities of these models. Recent explanation methods present visualizations of the relevance of pixels from input images, thus enabling the direct interpretation of properties of the input that lead to a specific output. These methods produce maps of pixel importance, which are commonly evaluated by visual inspection. This means that the effectiveness of an explanation method is assessed based on human expectation instead of actual feature importance. Thus, in this work we propose an objective measure to evaluate the reliability of explanations of deep models. Specifically, our approach is based on changes in the network's outcome resulting from the perturbation of input images in an adversarial way. We present a comparison between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Anomaly Detection Techniques and Applications
