Explaining Explainability: Recommendations for Effective Use of Concept   Activation Vectors

Angus Nicolson; Lisa Schut; J. Alison Noble; Yarin Gal

arXiv:2404.03713·cs.LG·February 14, 2025·1 cites

Explaining Explainability: Recommendations for Effective Use of Concept Activation Vectors

Angus Nicolson, Lisa Schut, J. Alison Noble, Yarin Gal

PDF

Open Access 1 Repo

TL;DR

This paper investigates properties of Concept Activation Vectors (CAVs) used in interpretability, identifies challenges like entanglement and spatial dependency, and offers tools and recommendations to improve concept-based explanations in deep learning models.

Contribution

It introduces tools to detect properties affecting CAV interpretability, demonstrates their impact on real tasks, and proposes methods to mitigate misleading explanations.

Findings

01

Entanglement affects interpretability of CAVs.

02

Negative probe set choice influences CAV meaning.

03

Spatially dependent CAVs can test translation invariance.

Abstract

Concept-based explanations translate the internal representations of deep learning models into a language that humans are familiar with: concepts. One popular method for finding concepts is Concept Activation Vectors (CAVs), which are learnt using a probe dataset of concept exemplars. In this work, we investigate three properties of CAVs: (1) inconsistency across layers, (2) entanglement with other concepts, and (3) spatial dependency. Each property provides both challenges and opportunities in interpreting models. We introduce tools designed to detect the presence of these properties, provide insight into how each property can lead to misleading explanations, and provide recommendations to mitigate their impact. To demonstrate practical applications, we apply our recommendations to a melanoma classification task, showing how entanglement can lead to uninterpretable results and that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

AngusNicolson/elements
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Text Analysis Techniques · Explainable Artificial Intelligence (XAI) · Data Visualization and Analytics

MethodsSparse Evolutionary Training