PACE: Posthoc Architecture-Agnostic Concept Extractor for Explaining CNNs
Vidhya Kamakshi, Uday Gupta, Narayanan C Krishnan

TL;DR
PACE is a novel posthoc method that automatically extracts human-interpretable, class-specific concepts from CNNs, enhancing trust and understanding of the model's decision process across different architectures.
Contribution
It introduces the first architecture-agnostic, automatic, posthoc concept extractor that improves interpretability of CNNs by extracting class-specific concepts.
Findings
Over 72% of concepts are human interpretable.
Effective across different CNN architectures.
Validated through extensive human subject experiments.
Abstract
Deep CNNs, though have achieved the state of the art performance in image classification tasks, remain a black-box to a human using them. There is a growing interest in explaining the working of these deep models to improve their trustworthiness. In this paper, we introduce a Posthoc Architecture-agnostic Concept Extractor (PACE) that automatically extracts smaller sub-regions of the image called concepts relevant to the black-box prediction. PACE tightly integrates the faithfulness of the explanatory framework to the black-box model. To the best of our knowledge, this is the first work that extracts class-specific discriminative concepts in a posthoc manner automatically. The PACE framework is used to generate explanations for two different CNN architectures trained for classifying the AWA2 and Imagenet-Birds datasets. Extensive human subject experiments are conducted to validate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
