Do Concept Bottleneck Models Respect Localities?
Naveen Raman, Mateo Espinosa Zarlenga, Juyeon Heo, Mateja Jamnik

TL;DR
This paper evaluates whether concept-based explainability models truly rely on relevant features by analyzing their respect for localities, revealing many models fail to distinguish relevant from irrelevant features, thus questioning their interpretability.
Contribution
The paper introduces three metrics to assess locality in concept predictors and provides theoretical analysis, highlighting limitations in current concept-based models.
Findings
Many concept-based models do not respect localities.
Concept predictors often rely on spurious features.
Current models struggle to distinguish relevant from irrelevant features.
Abstract
Concept-based explainability methods use human-understandable intermediaries to produce explanations for machine learning models. These methods assume concept predictions can help understand a model's internal reasoning. In this work, we assess the degree to which such an assumption is true by analyzing whether concept predictors leverage "relevant" features to make predictions, a term we call locality. Concept-based models that fail to respect localities also fail to be explainable because concept predictions are based on spurious features, making the interpretation of the concept predictions vacuous. To assess whether concept-based models respect localities, we construct and use three metrics to characterize when models respect localities, complementing our analysis with theoretical results. Each of our metrics captures a different notion of perturbation and assess whether perturbing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning · Machine Learning in Materials Science
