An Interpretable Neuron Embedding for Static Knowledge Distillation

Wei Han; Yangqiming Wang; Christian B\"ohm; Junming Shao

arXiv:2211.07647·cs.LG·November 16, 2022

An Interpretable Neuron Embedding for Static Knowledge Distillation

Wei Han, Yangqiming Wang, Christian B\"ohm, Junming Shao

PDF

Open Access

TL;DR

This paper introduces an interpretable neural network approach that embeds neurons into a semantic space, externalizing latent knowledge into static vectors for better interpretability and effective knowledge distillation.

Contribution

The paper proposes a novel neuron embedding method that externalizes latent knowledge into static semantic vectors, improving interpretability and distillation performance.

Findings

01

Semantic vectors effectively describe neuron activation semantics.

02

Static knowledge distillation achieves comparable or better results than relation-based methods.

03

Visualization of semantic vectors provides qualitative explanations of neural networks.

Abstract

Although deep neural networks have shown well-performance in various tasks, the poor interpretability of the models is always criticized. In the paper, we propose a new interpretable neural network method, by embedding neurons into the semantic space to extract their intrinsic global semantics. In contrast to previous methods that probe latent knowledge inside the model, the proposed semantic vector externalizes the latent knowledge to static knowledge, which is easy to exploit. Specifically, we assume that neurons with similar activation are of similar semantic information. Afterwards, semantic vectors are optimized by continuously aligning activation similarity and semantic vector similarity during the training of the neural network. The visualization of semantic vectors allows for a qualitative explanation of the neural network. Moreover, we assess the static knowledge quantitatively…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Neural Networks and Applications · Domain Adaptation and Few-Shot Learning

MethodsKnowledge Distillation