Explaining Deep Learning Hidden Neuron Activations using Concept Induction
Abhilekha Dalal, Md Kamruzzaman Sarker, Adrita Barua, and Pascal, Hitzler

TL;DR
This paper introduces a method that uses large-scale background knowledge and symbolic reasoning to automatically interpret hidden neuron activations in deep learning models, enhancing explainability.
Contribution
It presents a novel automated approach combining concept induction and background knowledge to interpret hidden neurons in neural networks.
Findings
Automatically attaches meaningful labels to neurons
Uses Wikipedia concept hierarchy for interpretation
Demonstrates effectiveness in CNN hidden layers
Abstract
One of the current key challenges in Explainable AI is in correctly interpreting activations of hidden neurons. It seems evident that accurate interpretations thereof would provide insights into the question what a deep learning system has internally \emph{detected} as relevant on the input, thus lifting some of the black box character of deep learning systems. The state of the art on this front indicates that hidden node activations appear to be interpretable in a way that makes sense to humans, at least in some cases. Yet, systematic automated methods that would be able to first hypothesize an interpretation of hidden neuron activations, and then verify it, are mostly missing. In this paper, we provide such a method and demonstrate that it provides meaningful interpretations. It is based on using large-scale background knowledge -- a class hierarchy of approx. 2 million classes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Natural Language Processing Techniques
