
TL;DR
This paper proposes a framework to interpret neural network neurons by training observer models to classify their states, revealing how networks encode salient input properties and how these properties relate to neuron depth.
Contribution
It introduces a novel method for understanding neural network semantics through observer models and visualizations, providing insights into neuron properties and their depth-dependent behavior.
Findings
Neural networks encode salient input properties in individual neurons.
Heat maps effectively visualize neuron relevance in classification.
Neuron properties vary systematically with network depth.
Abstract
We introduce a framework for reasoning about what meaning is captured by the neurons in a trained neural network. We provide a strategy for discovering meaning by training a second model (referred to as an observer model) to classify the state of the model it observes (an object model) in relation to attributes of the underlying dataset. We implement and evaluate observer models in the context of a specific set of classification problems, employ heat maps for visualizing the relevance of components of an object model in the context of linear observer models, and use these visualizations to extract insights about the manner in which neural networks identify salient characteristics of their inputs. We identify important properties captured decisively in trained neural networks; some of these properties are denoted by individual neurons. Finally, we observe that the label proportion of a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Neural Networks and Applications · Generative Adversarial Networks and Image Synthesis
