Interpreting Layered Neural Networks via Hierarchical Modular Representation
Chihiro Watanabe

TL;DR
This paper introduces a hierarchical clustering approach to interpret layered neural networks, revealing their global structure and relationships among units based on their correlation with input and output features.
Contribution
It proposes a novel hierarchical clustering method that uncovers the tree-structured relationships among neural network units, addressing prior issues of unknown optimal resolution and correlation interpretation.
Findings
Revealed hierarchical relationships among network units
Provided a method to interpret positive and negative correlations
Enhanced understanding of neural network global structure
Abstract
Interpreting the prediction mechanism of complex models is currently one of the most important tasks in the machine learning field, especially with layered neural networks, which have achieved high predictive performance with various practical data sets. To reveal the global structure of a trained neural network in an interpretable way, a series of clustering methods have been proposed, which decompose the units into clusters according to the similarity of their inference roles. The main problems in these studies were that (1) we have no prior knowledge about the optimal resolution for the decomposition, or the appropriate number of clusters, and (2) there was no method with which to acquire knowledge about whether the outputs of each cluster have a positive or negative correlation with the input and output dimension values. In this paper, to solve these problems, we propose a method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
