Unsupervised Learning of Neural Networks to Explain Neural Networks (extended abstract)
Quanshi Zhang, Yu Yang, Ying Nian Wu

TL;DR
This paper introduces an unsupervised approach to interpret CNN features by learning an explainer network that decomposes feature maps into object-part concepts, enhancing interpretability without sacrificing accuracy.
Contribution
The method learns an explainer network through knowledge distillation without annotations, providing a new way to interpret CNN features using interpretable visual concepts.
Findings
Explainer effectively decomposes CNN features into object parts.
Interpretability is improved without reducing CNN discrimination power.
Method works across different benchmark CNNs.
Abstract
This paper presents an unsupervised method to learn a neural network, namely an explainer, to interpret a pre-trained convolutional neural network (CNN), i.e., the explainer uses interpretable visual concepts to explain features in middle conv-layers of a CNN. Given feature maps of a conv-layer of the CNN, the explainer performs like an auto-encoder, which decomposes the feature maps into object-part features. The object-part features are learned to reconstruct CNN features without much loss of information. We can consider the disentangled representations of object parts a paraphrase of CNN features, which help people understand the knowledge encoded by the CNN. More crucially, we learn the explainer via knowledge distillation without using any annotations of object parts or textures for supervision. In experiments, our method was widely used to interpret features of different benchmark…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Adversarial Robustness in Machine Learning
MethodsKnowledge Distillation · Interpretability
