What You See is What You Classify: Black Box Attributions
Steven Stalder, Nathana\"el Perraudin, Radhakrishna Achanta, Fernando, Perez-Cruz, Michele Volpi

TL;DR
This paper introduces a novel method to generate precise, class-specific attribution masks for deep image classifiers by training a secondary network, improving accuracy and efficiency over existing saliency map techniques.
Contribution
We propose training a secondary network to predict attribution masks for a black-box classifier, enabling sharper, class-specific explanations in a single inference step.
Findings
Produces sharper, boundary-precise masks
Generates distinct class-specific masks efficiently
Outperforms existing methods on PASCAL VOC-2007 and COCO-2014
Abstract
An important step towards explaining deep image classifiers lies in the identification of image regions that contribute to individual class scores in the model's output. However, doing this accurately is a difficult task due to the black-box nature of such networks. Most existing approaches find such attributions either using activations and gradients or by repeatedly perturbing the input. We instead address this challenge by training a second deep network, the Explainer, to predict attributions for a pre-trained black-box classifier, the Explanandum. These attributions are provided in the form of masks that only show the classifier-relevant parts of an image, masking out the rest. Our approach produces sharper and more boundary-precise masks when compared to the saliency maps generated by other methods. Moreover, unlike most existing approaches, ours is capable of directly generating…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis
