Explaining Neural Networks by Decoding Layer Activations

Johannes Schneider; Michalis Vlachos

arXiv:2005.13630·cs.LG·March 1, 2021

Explaining Neural Networks by Decoding Layer Activations

Johannes Schneider, Michalis Vlachos

PDF

1 Repo

TL;DR

This paper introduces ClaDec, a neural network interpretability method that reconstructs layer representations into human-understandable images, improving insight into what neural network layers encode, especially for image classification tasks.

Contribution

ClaDec is a novel architecture that decodes neural network layer activations into interpretable images, enabling better understanding of layer representations compared to traditional auto-encoders.

Findings

01

Reconstructed images from ClaDec capture more relevant information for classification.

02

ClaDec allows a trade-off between interpretability and fidelity.

03

The approach improves understanding of neural network internal representations.

Abstract

We present a `CLAssifier-DECoder' architecture (\emph{ClaDec}) which facilitates the comprehension of the output of an arbitrary layer in a neural network (NN). It uses a decoder to transform the non-interpretable representation of the given layer to a representation that is more similar to the domain a human is familiar with. In an image recognition problem, one can recognize what information is represented by a layer by contrasting reconstructed images of \emph{ClaDec} with those of a conventional auto-encoder(AE) serving as reference. We also extend \emph{ClaDec} to allow the trade-off between human interpretability and fidelity. We evaluate our approach for image classification using Convolutional NNs. We show that reconstructed visualizations using encodings from a classifier capture more relevant information for classification than conventional AEs. Relevant code is available at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

JohnTailor/ClaDec
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsInterpretability