Towards Concept-based Interpretability of Skin Lesion Diagnosis using Vision-Language Models
Cristiano Patr\'icio, Lu\'is F. Teixeira, Jo\~ao C. Neves

TL;DR
This paper demonstrates that vision-language models like CLIP can improve skin lesion diagnosis interpretability and accuracy using concept-based descriptions, even with limited annotated data.
Contribution
It introduces an embedding learning strategy to adapt CLIP for skin lesion classification with concept descriptions, reducing the need for extensive concept-annotated datasets.
Findings
Vision-language models improve diagnosis accuracy with concept descriptions.
Fewer concept-annotated samples are needed for comparable performance.
The approach enhances interpretability of skin lesion diagnosis models.
Abstract
Concept-based models naturally lend themselves to the development of inherently interpretable skin lesion diagnosis, as medical experts make decisions based on a set of visual patterns of the lesion. Nevertheless, the development of these models depends on the existence of concept-annotated datasets, whose availability is scarce due to the specialized knowledge and expertise required in the annotation process. In this work, we show that vision-language models can be used to alleviate the dependence on a large number of concept-annotated samples. In particular, we propose an embedding learning strategy to adapt CLIP to the downstream task of skin lesion classification using concept-based descriptions as textual embeddings. Our experiments reveal that vision-language models not only attain better accuracy when using concepts as textual embeddings, but also require a smaller number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCancer-related molecular mechanisms research
MethodsSparse Evolutionary Training · Contrastive Language-Image Pre-training
