Few-shot medical image classification with simple shape and texture text descriptors using vision-language models
Michal Byra, Muhammad Febrian Rachmadi, Henrik Skibbe

TL;DR
This paper explores using vision-language models and GPT-4 generated shape and texture descriptors for few-shot classification of medical images, demonstrating viability and highlighting key considerations for accurate results.
Contribution
It introduces a novel approach combining GPT-4 text descriptors with VLMs for medical image classification, revealing insights into descriptor selection and model evaluation.
Findings
Few-shot classification is feasible with VLMs and GPT-4 descriptors.
Excluding certain descriptors improves classification accuracy.
VLMs can evaluate shape features in ultrasound images.
Abstract
In this work, we investigate the usefulness of vision-language models (VLMs) and large language models for binary few-shot classification of medical images. We utilize the GPT-4 model to generate text descriptors that encapsulate the shape and texture characteristics of objects in medical images. Subsequently, these GPT-4 generated descriptors, alongside VLMs pre-trained on natural images, are employed to classify chest X-rays and breast ultrasound images. Our results indicate that few-shot classification of medical images using VLMs and GPT-4 generated descriptors is a viable approach. However, accurate classification requires to exclude certain descriptors from the calculations of the classification scores. Moreover, we assess the ability of VLMs to evaluate shape features in breast mass ultrasound images. We further investigate the degree of variability among the sets of text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · AI in cancer detection
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Layer Normalization · Label Smoothing · Adam · Residual Connection · Dense Connections · Dropout
