Labeling Neural Representations with Inverse Recognition

Kirill Bykov; Laura Kopf; Shinichi Nakajima; Marius Kloft; Marina; M.-C. H\"ohne

arXiv:2311.13594·cs.LG·January 19, 2024·1 cites

Labeling Neural Representations with Inverse Recognition

Kirill Bykov, Laura Kopf, Shinichi Nakajima, Marius Kloft, Marina, M.-C. H\"ohne

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces INVERT, a scalable and interpretable method to connect neural network representations with human-understandable concepts, overcoming limitations of existing explainability techniques.

Contribution

INVERT is a novel approach that handles diverse neurons, reduces computational costs, and provides statistical significance without needing segmentation masks.

Findings

01

Effectively identifies representations influenced by spurious correlations.

02

Interprets hierarchical decision structures within neural networks.

03

Offers an interpretable metric for representation-concept alignment.

Abstract

Deep Neural Networks (DNNs) demonstrate remarkable capabilities in learning complex hierarchical data representations, but the nature of these representations remains largely unknown. Existing global explainability methods, such as Network Dissection, face limitations such as reliance on segmentation masks, lack of statistical significance testing, and high computational demands. We propose Inverse Recognition (INVERT), a scalable approach for connecting learned representations with human-understandable concepts by leveraging their capacity to discriminate between these concepts. In contrast to prior work, INVERT is capable of handling diverse types of neurons, exhibits less computational complexity, and does not rely on the availability of segmentation masks. Moreover, INVERT provides an interpretable metric assessing the alignment between the representation and its corresponding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Labeling Neural Representations with Inverse Recognition· slideslive

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Neural Networks and Applications

MethodsNetwork Dissection