TL;DR
This paper introduces new metrics for evaluating saliency explanations in image classification, benchmarks existing methods on ImageNet, and proposes a reliability scheme inspired by psychometric testing to improve XAI evaluation.
Contribution
It develops novel evaluation metrics for saliency methods and proposes a reliability assessment scheme, addressing the limitations of current proxy metrics in XAI.
Findings
New metrics for saliency explanation evaluation
Benchmarking of common saliency methods on ImageNet
A reliability evaluation scheme for explanation metrics
Abstract
Decision processes of computer vision models - especially deep neural networks - are opaque in nature, meaning that these decisions cannot be understood by humans. Thus, over the last years, many methods to provide human-understandable explanations have been proposed. For image classification, the most common group are saliency methods, which provide (super-)pixelwise feature attribution scores for input images. But their evaluation still poses a problem, as their results cannot be simply compared to the unknown ground truth. To overcome this, a slew of different proxy metrics have been defined, which are - as the explainability methods themselves - often built on intuition and thus, are possibly unreliable. In this paper, new evaluation metrics for saliency methods are developed and common saliency methods are benchmarked on ImageNet. In addition, a scheme for reliability evaluation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
