Labeling Neural Representations with Inverse Recognition
Kirill Bykov, Laura Kopf, Shinichi Nakajima, Marius Kloft, Marina, M.-C. H\"ohne

TL;DR
This paper introduces INVERT, a scalable and interpretable method to connect neural network representations with human-understandable concepts, overcoming limitations of existing explainability techniques.
Contribution
INVERT is a novel approach that handles diverse neurons, reduces computational costs, and provides statistical significance without needing segmentation masks.
Findings
Effectively identifies representations influenced by spurious correlations.
Interprets hierarchical decision structures within neural networks.
Offers an interpretable metric for representation-concept alignment.
Abstract
Deep Neural Networks (DNNs) demonstrate remarkable capabilities in learning complex hierarchical data representations, but the nature of these representations remains largely unknown. Existing global explainability methods, such as Network Dissection, face limitations such as reliance on segmentation masks, lack of statistical significance testing, and high computational demands. We propose Inverse Recognition (INVERT), a scalable approach for connecting learned representations with human-understandable concepts by leveraging their capacity to discriminate between these concepts. In contrast to prior work, INVERT is capable of handling diverse types of neurons, exhibits less computational complexity, and does not rely on the availability of segmentation masks. Moreover, INVERT provides an interpretable metric assessing the alignment between the representation and its corresponding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Neural Networks and Applications
MethodsNetwork Dissection
