Understanding Multimodal Deep Neural Networks: A Concept Selection View

Chenming Shang; Hengyuan Zhang; Hao Wen; Yujiu Yang

arXiv:2404.08964·cs.CV·April 16, 2024·1 cites

Understanding Multimodal Deep Neural Networks: A Concept Selection View

Chenming Shang, Hengyuan Zhang, Hao Wen, Yujiu Yang

PDF

Open Access

TL;DR

This paper introduces a two-stage concept selection model for understanding multimodal neural networks like CLIP, extracting core concepts without human labels, and demonstrating interpretability and comparable performance.

Contribution

A novel two-stage concept selection approach that identifies core concepts in multimodal models without relying on human-labeled data.

Findings

01

Achieves comparable performance to black-box models.

02

Concepts discovered are interpretable and comprehensible.

03

Effective in mining core concepts without human priors.

Abstract

The multimodal deep neural networks, represented by CLIP, have generated rich downstream applications owing to their excellent performance, thus making understanding the decision-making process of CLIP an essential research topic. Due to the complex structure and the massive pre-training data, it is often regarded as a black-box model that is too difficult to understand and interpret. Concept-based models map the black-box visual representations extracted by deep neural networks onto a set of human-understandable concepts and use the concepts to make predictions, enhancing the transparency of the decision-making process. However, these methods involve the datasets labeled with fine-grained attributes by expert knowledge, which incur high costs and introduce excessive human prior knowledge and bias. In this paper, we observe the long-tail distribution of concepts, based on which we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training