Explaining Deep Learning Hidden Neuron Activations using Concept   Induction

Abhilekha Dalal; Md Kamruzzaman Sarker; Adrita Barua; and Pascal; Hitzler

arXiv:2301.09611·cs.LG·January 24, 2023

Explaining Deep Learning Hidden Neuron Activations using Concept Induction

Abhilekha Dalal, Md Kamruzzaman Sarker, Adrita Barua, and Pascal, Hitzler

PDF

Open Access

TL;DR

This paper introduces a method that uses large-scale background knowledge and symbolic reasoning to automatically interpret hidden neuron activations in deep learning models, enhancing explainability.

Contribution

It presents a novel automated approach combining concept induction and background knowledge to interpret hidden neurons in neural networks.

Findings

01

Automatically attaches meaningful labels to neurons

02

Uses Wikipedia concept hierarchy for interpretation

03

Demonstrates effectiveness in CNN hidden layers

Abstract

One of the current key challenges in Explainable AI is in correctly interpreting activations of hidden neurons. It seems evident that accurate interpretations thereof would provide insights into the question what a deep learning system has internally \emph{detected} as relevant on the input, thus lifting some of the black box character of deep learning systems. The state of the art on this front indicates that hidden node activations appear to be interpretable in a way that makes sense to humans, at least in some cases. Yet, systematic automated methods that would be able to first hypothesize an interpretation of hidden neuron activations, and then verify it, are mostly missing. In this paper, we provide such a method and demonstrate that it provides meaningful interpretations. It is based on using large-scale background knowledge -- a class hierarchy of approx. 2 million classes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Natural Language Processing Techniques