Invertible Concept-based Explanations for CNN Models with Non-negative   Concept Activation Vectors

Ruihan Zhang; Prashan Madumal; Tim Miller; Krista A. Ehinger; Benjamin; I. P. Rubinstein

arXiv:2006.15417·cs.CV·June 18, 2021

Invertible Concept-based Explanations for CNN Models with Non-negative Concept Activation Vectors

Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A. Ehinger, Benjamin, I. P. Rubinstein

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces an invertible concept-based explanation framework for CNNs, utilizing non-negative matrix factorization to improve interpretability and fidelity of concept-level explanations.

Contribution

It proposes the ICE framework with NCAVs, enhancing explanation quality over previous methods like ACE through novel matrix factorization techniques.

Findings

01

NCAVs outperform other methods in interpretability

02

NCAVs achieve higher fidelity in explanations

03

Framework provides both local and global explanations

Abstract

Convolutional neural network (CNN) models for computer vision are powerful but lack explainability in their most basic form. This deficiency remains a key challenge when applying CNNs in important domains. Recent work on explanations through feature importance of approximate linear models has moved from input-level features (pixels or segments) to features from mid-layer feature maps in the form of concept activation vectors (CAVs). CAVs contain concept-level information and could be learned via clustering. In this work, we rethink the ACE algorithm of Ghorbani et~al., proposing an alternative invertible concept-based explanation (ICE) framework to overcome its shortcomings. Based on the requirements of fidelity (approximate models to target models) and interpretability (being meaningful to people), we design measurements and evaluate a range of matrix factorization methods with our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhangrh93/InvertibleCE
pytorchOfficial

Videos

Invertible Concept-Based Explanations for CNN Models with Non-Negative Concept Activation Vectors· underline

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Advanced Neural Network Applications · Machine Learning in Materials Science

MethodsInterpretability