Analyzing Representations inside Convolutional Neural Networks

Uday Singh Saini; Evangelos E. Papalexakis

arXiv:2012.12516·cs.LG·December 24, 2020

Analyzing Representations inside Convolutional Neural Networks

Uday Singh Saini, Evangelos E. Papalexakis

PDF

Open Access

TL;DR

This paper introduces an unsupervised framework to analyze and summarize the concepts learned by convolutional neural networks by clustering internal activations and input features, making the learned representations more interpretable.

Contribution

The work presents a novel unsupervised method to categorize neural network concepts based on internal activations, applicable without labeled data.

Findings

01

Produces human-understandable concepts

02

Effective on ResNet-18 with CIFAR-100

03

Clustering reveals coherent learned representations

Abstract

How can we discover and succinctly summarize the concepts that a neural network has learned? Such a task is of great importance in applications of networks in areas of inference that involve classification, like medical diagnosis based on fMRI/x-ray etc. In this work, we propose a framework to categorize the concepts a network learns based on the way it clusters a set of input examples, clusters neurons based on the examples they activate for, and input features all in the same latent space. This framework is unsupervised and can work without any labels for input features, it only needs access to internal activations of the network for each input example, thereby making it widely applicable. We extensively evaluate the proposed method and demonstrate that it produces human-understandable and coherent concepts that a ResNet-18 has learned on the CIFAR-100 dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification