Explainable Concept Generation through Vision-Language Preference Learning for Understanding Neural Networks' Internal Representations

Aditya Taparia; Som Sagar; Ransalu Senanayake

arXiv:2408.13438·cs.CV·June 9, 2025

Explainable Concept Generation through Vision-Language Preference Learning for Understanding Neural Networks' Internal Representations

Aditya Taparia, Som Sagar, Ransalu Senanayake

PDF

Open Access 1 Video

TL;DR

This paper introduces a reinforcement learning-based method to automatically generate meaningful visual concepts for neural network explanations, reducing manual effort and improving the discovery of important high-level features.

Contribution

It proposes a novel RL-based approach to optimize vision-language models for automatic concept generation, enhancing explainability of neural networks.

Findings

01

Efficiently generates diverse meaningful concepts

02

Reduces manual effort in concept set creation

03

Improves understanding of neural network internal representations

Abstract

Understanding the inner representation of a neural network helps users improve models. Concept-based methods have become a popular choice for explaining deep neural networks post-hoc because, unlike most other explainable AI techniques, they can be used to test high-level visual "concepts" that are not directly related to feature attributes. For instance, the concept of "stripes" is important to classify an image as a zebra. Concept-based explanation methods, however, require practitioners to guess and manually collect multiple candidate concept image sets, making the process labor-intensive and prone to overlooking important concepts. Addressing this limitation, in this paper, we frame concept image set creation as an image generation problem. However, since naively using a standard generative model does not result in meaningful concepts, we devise a reinforcement learning-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Explainable Concept Generation through Vision-Language Preference Learning for Understanding Neural Networks' Internal Representations· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling

MethodsSparse Evolutionary Training