Concept backpropagation: An Explainable AI approach for visualising   learned concepts in neural network models

Patrik Hammersborg; Inga Str\"umke

arXiv:2307.12601·cs.LG·July 25, 2023

Concept backpropagation: An Explainable AI approach for visualising learned concepts in neural network models

Patrik Hammersborg, Inga Str\"umke

PDF

Open Access 1 Repo

TL;DR

Concept backpropagation is a novel explainable AI method that visualizes how neural networks internalize specific concepts by perturbing inputs guided by concept probes, enhancing interpretability across various modalities.

Contribution

This work extends concept detection methods with concept backpropagation, enabling visualization of concept representations within neural networks through input perturbation guided by trained probes.

Findings

01

Effective visualization of concept internalization across modalities

02

Insights into concept entanglement within neural network representations

03

Potential for improved interpretability of black-box models

Abstract

Neural network models are widely used in a variety of domains, often as black-box solutions, since they are not directly interpretable for humans. The field of explainable artificial intelligence aims at developing explanation methods to address this challenge, and several approaches have been developed over the recent years, including methods for investigating what type of knowledge these models internalise during the training process. Among these, the method of concept detection, investigates which \emph{concepts} neural network models learn to represent in order to complete their tasks. In this work, we present an extension to the method of concept detection, named \emph{concept backpropagation}, which provides a way of analysing how the information representing a given concept is internalised in a given neural network model. In this approach, the model input is perturbed in a manner…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

patrik-ha/concept-backpropagation
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Cell Image Analysis Techniques