Visual concept ranking uncovers medical shortcuts used by large multimodal models
Joseph D. Janizek, Sonnet Xu, Junayd Lateef, Roxana Daneshjou

TL;DR
This paper introduces Visual Concept Ranking (VCR), a method to identify visual features influencing large multimodal models, revealing unexpected biases and gaps in medical image classification tasks.
Contribution
The paper presents VCR, a novel technique for uncovering visual concept dependencies in multimodal models, specifically applied to medical diagnosis tasks.
Findings
LMMs show performance gaps across demographic groups.
VCR can generate hypotheses about visual feature dependencies.
Manual validation confirms the relevance of identified visual concepts.
Abstract
Ensuring the reliability of machine learning models in safety-critical domains such as healthcare requires auditing methods that can uncover model shortcomings. We introduce a method for identifying important visual concepts within large multimodal models (LMMs) and use it to investigate the behaviors these models exhibit when prompted with medical tasks. We primarily focus on the task of classifying malignant skin lesions from clinical dermatology images, with supplemental experiments including both chest radiographs and natural images. After showing how LMMs display unexpected gaps in performance between different demographic subgroups when prompted with demonstrating examples, we apply our method, Visual Concept Ranking (VCR), to these models and prompts. VCR generates hypotheses related to different visual feature dependencies, which we are then able to validate with manual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · AI in cancer detection
