Concept Based Explanations and Class Contrasting
Rudolf Herdt, Daniel Otero Baguer

TL;DR
This paper introduces a concept-based explanation method for deep neural networks that can explain individual class predictions and contrast between classes, tested on ImageNet models with promising results.
Contribution
The paper presents a novel concept-based explanation technique capable of contrasting classes, enhancing interpretability of deep models on large datasets.
Findings
Achieved 91.1% success in class contrast experiments on ImageNet models
Demonstrated the method's effectiveness on multiple classification models
Provided open-source code and examples for reproducibility
Abstract
Explaining deep neural networks is challenging, due to their large size and non-linearity. In this paper, we introduce a concept-based explanation method, in order to explain the prediction for an individual class, as well as contrasting any two classes, i.e. explain why the model predicts one class over the other. We test it on several openly available classification models trained on ImageNet1K. We perform both qualitative and quantitative tests. For example, for a ResNet50 model from pytorch model zoo, we can use the explanation for why the model predicts a class 'A' to automatically select four dataset crops where the model does not predict class 'A'. The model then predicts class 'A' again for the newly combined image in 91.1% of the cases (works for 911 out of the 1000 classes). The code including an .ipynb example is available on github:…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Advanced Neural Network Applications · Adversarial Robustness in Machine Learning
