Exploring layerwise decision making in DNNs
Coenraad Mouton, Marelie H. Davel

TL;DR
This paper introduces a method to interpret deep neural networks by encoding layer activations into decision trees, enabling layerwise explanations and analysis of the model's decision process.
Contribution
It presents a novel approach to extract decision trees from DNN layers using binary encoding of activations, enhancing interpretability.
Findings
Decision trees can effectively explain layerwise decisions in DNNs.
Binary encoding of activations reveals insights into model behavior.
Interpretations correlate with training sample groupings.
Abstract
While deep neural networks (DNNs) have become a standard architecture for many machine learning tasks, their internal decision-making process and general interpretability is still poorly understood. Conversely, common decision trees are easily interpretable and theoretically well understood. We show that by encoding the discrete sample activation values of nodes as a binary representation, we are able to extract a decision tree explaining the classification procedure of each layer in a ReLU-activated multilayer perceptron (MLP). We then combine these decision trees with existing feature attribution techniques in order to produce an interpretation of each layer of a model. Finally, we provide an analysis of the generated interpretations, the behaviour of the binary encodings and how these relate to sample groupings created during the training process of the neural network.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
