Concept backpropagation: An Explainable AI approach for visualising learned concepts in neural network models
Patrik Hammersborg, Inga Str\"umke

TL;DR
Concept backpropagation is a novel explainable AI method that visualizes how neural networks internalize specific concepts by perturbing inputs guided by concept probes, enhancing interpretability across various modalities.
Contribution
This work extends concept detection methods with concept backpropagation, enabling visualization of concept representations within neural networks through input perturbation guided by trained probes.
Findings
Effective visualization of concept internalization across modalities
Insights into concept entanglement within neural network representations
Potential for improved interpretability of black-box models
Abstract
Neural network models are widely used in a variety of domains, often as black-box solutions, since they are not directly interpretable for humans. The field of explainable artificial intelligence aims at developing explanation methods to address this challenge, and several approaches have been developed over the recent years, including methods for investigating what type of knowledge these models internalise during the training process. Among these, the method of concept detection, investigates which \emph{concepts} neural network models learn to represent in order to complete their tasks. In this work, we present an extension to the method of concept detection, named \emph{concept backpropagation}, which provides a way of analysing how the information representing a given concept is internalised in a given neural network model. In this approach, the model input is perturbed in a manner…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Cell Image Analysis Techniques
