Towards Automatic Concept-based Explanations
Amirata Ghorbani, James Wexler, James Zou, Been Kim

TL;DR
This paper introduces ACE, a new algorithm for automatically extracting human-understandable visual concepts from machine learning models, aiming to improve interpretability by summarizing feature importance at a higher conceptual level.
Contribution
The paper proposes principles for concept-based explanations and develops ACE, an algorithm that automatically identifies meaningful visual concepts across datasets.
Findings
ACE discovers human-meaningful concepts
Concepts are coherent and relevant to predictions
Method enhances interpretability of neural networks
Abstract
Interpretability has become an important topic of research as more machine learning (ML) models are deployed and widely used to make important decisions. Most of the current explanation methods provide explanations through feature importance scores, which identify features that are important for each individual input. However, how to systematically summarize and interpret such per sample feature importance scores itself is challenging. In this work, we propose principles and desiderata for \emph{concept} based explanation, which goes beyond per-sample features to identify higher-level human-understandable concepts that apply across the entire dataset. We develop a new algorithm, ACE, to automatically extract visual concepts. Our systematic experiments demonstrate that \alg discovers concepts that are human-meaningful, coherent and important for the neural network's predictions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Multimodal Machine Learning Applications
