Explainable Concept Generation through Vision-Language Preference Learning for Understanding Neural Networks' Internal Representations
Aditya Taparia, Som Sagar, Ransalu Senanayake

TL;DR
This paper introduces a reinforcement learning-based method to automatically generate meaningful visual concepts for neural network explanations, reducing manual effort and improving the discovery of important high-level features.
Contribution
It proposes a novel RL-based approach to optimize vision-language models for automatic concept generation, enhancing explainability of neural networks.
Findings
Efficiently generates diverse meaningful concepts
Reduces manual effort in concept set creation
Improves understanding of neural network internal representations
Abstract
Understanding the inner representation of a neural network helps users improve models. Concept-based methods have become a popular choice for explaining deep neural networks post-hoc because, unlike most other explainable AI techniques, they can be used to test high-level visual "concepts" that are not directly related to feature attributes. For instance, the concept of "stripes" is important to classify an image as a zebra. Concept-based explanation methods, however, require practitioners to guess and manually collect multiple candidate concept image sets, making the process labor-intensive and prone to overlooking important concepts. Addressing this limitation, in this paper, we frame concept image set creation as an image generation problem. However, since naively using a standard generative model does not result in meaningful concepts, we devise a reinforcement learning-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
MethodsSparse Evolutionary Training
