Explaining Explainability: Recommendations for Effective Use of Concept Activation Vectors
Angus Nicolson, Lisa Schut, J. Alison Noble, Yarin Gal

TL;DR
This paper investigates properties of Concept Activation Vectors (CAVs) used in interpretability, identifies challenges like entanglement and spatial dependency, and offers tools and recommendations to improve concept-based explanations in deep learning models.
Contribution
It introduces tools to detect properties affecting CAV interpretability, demonstrates their impact on real tasks, and proposes methods to mitigate misleading explanations.
Findings
Entanglement affects interpretability of CAVs.
Negative probe set choice influences CAV meaning.
Spatially dependent CAVs can test translation invariance.
Abstract
Concept-based explanations translate the internal representations of deep learning models into a language that humans are familiar with: concepts. One popular method for finding concepts is Concept Activation Vectors (CAVs), which are learnt using a probe dataset of concept exemplars. In this work, we investigate three properties of CAVs: (1) inconsistency across layers, (2) entanglement with other concepts, and (3) spatial dependency. Each property provides both challenges and opportunities in interpreting models. We introduce tools designed to detect the presence of these properties, provide insight into how each property can lead to misleading explanations, and provide recommendations to mitigate their impact. To demonstrate practical applications, we apply our recommendations to a melanoma classification task, showing how entanglement can lead to uninterpretable results and that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Explainable Artificial Intelligence (XAI) · Data Visualization and Analytics
MethodsSparse Evolutionary Training
