Concept Visualization: Explaining the CLIP Multi-modal Embedding Using   WordNet

Loris Giulivi; Giacomo Boracchi

arXiv:2405.14563·cs.CV·May 24, 2024

Concept Visualization: Explaining the CLIP Multi-modal Embedding Using WordNet

Loris Giulivi, Giacomo Boracchi

PDF

Open Access 1 Repo

TL;DR

This paper introduces Concept Visualization (ConVis), a novel method that uses WordNet to generate task-agnostic saliency maps for CLIP embeddings, enhancing explainability and interpretability of multi-modal models in vision tasks.

Contribution

The paper presents ConVis, a new explainability technique leveraging lexical information from WordNet to produce concept-based saliency maps for CLIP, independent of specific training classes.

Findings

01

ConVis accurately localizes semantic content in images.

02

It outperforms traditional saliency methods in out-of-distribution detection.

03

User studies confirm improved interpretability with ConVis.

Abstract

Advances in multi-modal embeddings, and in particular CLIP, have recently driven several breakthroughs in Computer Vision (CV). CLIP has shown impressive performance on a variety of tasks, yet, its inherently opaque architecture may hinder the application of models employing CLIP as backbone, especially in fields where trust and model explainability are imperative, such as in the medical domain. Current explanation methodologies for CV models rely on Saliency Maps computed through gradient analysis or input perturbation. However, these Saliency Maps can only be computed to explain classes relevant to the end task, often smaller in scope than the backbone training classes. In the context of models implementing CLIP as their vision backbone, a substantial portion of the information embedded within the learned representations is thus left unexplained. In this work, we propose Concept…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

loris2222/concept-visualization
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling

MethodsContrastive Language-Image Pre-training