Analyzing Representations inside Convolutional Neural Networks
Uday Singh Saini, Evangelos E. Papalexakis

TL;DR
This paper introduces an unsupervised framework to analyze and summarize the concepts learned by convolutional neural networks by clustering internal activations and input features, making the learned representations more interpretable.
Contribution
The work presents a novel unsupervised method to categorize neural network concepts based on internal activations, applicable without labeled data.
Findings
Produces human-understandable concepts
Effective on ResNet-18 with CIFAR-100
Clustering reveals coherent learned representations
Abstract
How can we discover and succinctly summarize the concepts that a neural network has learned? Such a task is of great importance in applications of networks in areas of inference that involve classification, like medical diagnosis based on fMRI/x-ray etc. In this work, we propose a framework to categorize the concepts a network learns based on the way it clusters a set of input examples, clusters neurons based on the examples they activate for, and input features all in the same latent space. This framework is unsupervised and can work without any labels for input features, it only needs access to internal activations of the network for each input example, thereby making it widely applicable. We extensively evaluate the proposed method and demonstrate that it produces human-understandable and coherent concepts that a ResNet-18 has learned on the CIFAR-100 dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
