TL;DR
This paper introduces ClaDec, a neural network interpretability method that reconstructs layer representations into human-understandable images, improving insight into what neural network layers encode, especially for image classification tasks.
Contribution
ClaDec is a novel architecture that decodes neural network layer activations into interpretable images, enabling better understanding of layer representations compared to traditional auto-encoders.
Findings
Reconstructed images from ClaDec capture more relevant information for classification.
ClaDec allows a trade-off between interpretability and fidelity.
The approach improves understanding of neural network internal representations.
Abstract
We present a `CLAssifier-DECoder' architecture (\emph{ClaDec}) which facilitates the comprehension of the output of an arbitrary layer in a neural network (NN). It uses a decoder to transform the non-interpretable representation of the given layer to a representation that is more similar to the domain a human is familiar with. In an image recognition problem, one can recognize what information is represented by a layer by contrasting reconstructed images of \emph{ClaDec} with those of a conventional auto-encoder(AE) serving as reference. We also extend \emph{ClaDec} to allow the trade-off between human interpretability and fidelity. We evaluate our approach for image classification using Convolutional NNs. We show that reconstructed visualizations using encodings from a classifier capture more relevant information for classification than conventional AEs. Relevant code is available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsInterpretability
