A Concept-Based Explainability Framework for Large Multimodal Models

Jayneel Parekh; Pegah Khayatan; Mustafa Shukor; Alasdair Newson,; Matthieu Cord

arXiv:2406.08074·cs.LG·December 3, 2024·3 cites

A Concept-Based Explainability Framework for Large Multimodal Models

Jayneel Parekh, Pegah Khayatan, Mustafa Shukor, Alasdair Newson,, Matthieu Cord

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel dictionary learning-based framework to interpret large multimodal models by extracting semantically grounded multimodal concepts, enhancing understanding of internal representations.

Contribution

It proposes a new interpretability method for LMMs using dictionary learning to identify and ground multimodal concepts within token representations.

Findings

01

Extracted semantically meaningful multimodal concepts

02

Improved interpretability of LMM internal representations

03

Demonstrated usefulness of concepts in understanding model behavior

Abstract

Large multimodal models (LMMs) combine unimodal encoders and large language models (LLMs) to perform multimodal tasks. Despite recent advancements towards the interpretability of these models, understanding internal representations of LMMs remains largely a mystery. In this paper, we present a novel framework for the interpretation of LMMs. We propose a dictionary learning based approach, applied to the representation of tokens. The elements of the learned dictionary correspond to our proposed concepts. We show that these concepts are well semantically grounded in both vision and text. Thus we refer to these as ``multi-modal concepts''. We qualitatively and quantitatively evaluate the results of the learnt concepts. We show that the extracted multimodal concepts are useful to interpret representations of test samples. Finally, we evaluate the disentanglement between different concepts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mshukor/xl-vlms
pytorchOfficial

Videos

A Concept-Based Explainability Framework for Large Multimodal Models· slideslive

Taxonomy

TopicsAdvanced Text Analysis Techniques · Semantic Web and Ontologies