TL;DR
MACE is a model-agnostic framework that explains image classification networks by extracting and evaluating smaller, human-interpretable concepts from feature maps, enhancing interpretability and faithfulness of explanations.
Contribution
It introduces a novel concept extraction method that dissects CNN feature maps and estimates concept relevance, improving interpretability of model predictions.
Findings
Concepts extracted increase human interpretability.
Framework is validated on VGG16 and ResNet50.
Concept relevance estimation improves explanation faithfulness.
Abstract
Deep convolutional networks have been quite successful at various image classification tasks. The current methods to explain the predictions of a pre-trained model rely on gradient information, often resulting in saliency maps that focus on the foreground object as a whole. However, humans typically reason by dissecting an image and pointing out the presence of smaller concepts. The final output is often an aggregation of the presence or absence of these smaller concepts. In this work, we propose MACE: a Model Agnostic Concept Extractor, which can explain the working of a convolutional network through smaller concepts. The MACE framework dissects the feature maps generated by a convolution network for an image to extract concept based prototypical explanations. Further, it estimates the relevance of the extracted concepts to the pre-trained model's predictions, a critical aspect…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsInterpretability · Convolution
