Envisioning MedCLIP: A Deep Dive into Explainability for Medical Vision-Language Models
Anees Ur Rehman Hashmi, Dwarikanath Mahapatra, Mohammad Yaqub

TL;DR
This paper evaluates the effectiveness of explainability methods for MedCLIP, a medical vision-language model, and proposes a simple approach to improve interpretability, addressing a critical gap for safe deployment in healthcare.
Contribution
It provides a comprehensive analysis of explainability techniques on MedCLIP and introduces a methodology to enhance their performance in medical vision-language models.
Findings
Explainability methods have limitations on MedCLIP's interpretability.
A simple methodology can improve explainability performance.
Assessment approach is applicable to other vision-language models.
Abstract
Explaining Deep Learning models is becoming increasingly important in the face of daily emerging multimodal models, particularly in safety-critical domains like medical imaging. However, the lack of detailed investigations into the performance of explainability methods on these models is widening the gap between their development and safe deployment. In this work, we analyze the performance of various explainable AI methods on a vision-language model, MedCLIP, to demystify its inner workings. We also provide a simple methodology to overcome the shortcomings of these methods. Our work offers a different new perspective on the explainability of a recent well-known VLM in the medical domain and our assessment method is generalizable to other current and possible future VLMs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling
