When are Post-hoc Conceptual Explanations Identifiable?
Tobias Leemann, Michael Kirchhof, Yao Rong, Enkelejda Kasneci, Gjergji, Kasneci

TL;DR
This paper establishes conditions under which post-hoc concept discovery in embedding spaces can be provably reliable, connecting classical methods and proposing new approaches for dependent concepts, with significant empirical improvements.
Contribution
It introduces the concept of identifiability in concept discovery, linking it to classical methods and proposing novel approaches for dependent concepts, ensuring provable recovery of known concepts.
Findings
Proposed methods outperform competitors by up to 29% in alignment with ground truth.
Established formal conditions for reliable, label-free concept discovery.
Connected classical PCA/ICA methods to concept recovery under specific distributions.
Abstract
Interest in understanding and factorizing learned embedding spaces through conceptual explanations is steadily growing. When no human concept labels are available, concept discovery methods search trained embedding spaces for interpretable concepts like object shape or color that can provide post-hoc explanations for decisions. Unlike previous work, we argue that concept discovery should be identifiable, meaning that a number of known concepts can be provably recovered to guarantee reliability of the explanations. As a starting point, we explicitly make the connection between concept discovery and classical methods like Principal Component Analysis and Independent Component Analysis by showing that they can recover independent concepts under non-Gaussian distributions. For dependent concepts, we propose two novel approaches that exploit functional compositionality properties of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI) · Machine Learning in Materials Science
MethodsIndependent Component Analysis
