How Does Diverse Interpretability of Textual Prompts Impact Medical Vision-Language Zero-Shot Tasks?
Sicheng Wang, Che Liu, Rossella Arcucci

TL;DR
This paper systematically evaluates how the variability in textual prompts affects the robustness of medical vision-language models in zero-shot tasks, revealing significant sensitivity and instability across different prompt styles.
Contribution
First comprehensive assessment of prompt sensitivity in MedVLP, highlighting the models' instability and the impact of prompt interpretability on performance.
Findings
Models show unstable performance across prompt styles
Performance varies with prompt interpretability
Models struggle with complex medical concepts
Abstract
Recent advancements in medical vision-language pre-training (MedVLP) have significantly enhanced zero-shot medical vision tasks such as image classification by leveraging large-scale medical image-text pair pre-training. However, the performance of these tasks can be heavily influenced by the variability in textual prompts describing the categories, necessitating robustness in MedVLP models to diverse prompt styles. Yet, this sensitivity remains underexplored. In this work, we are the first to systematically assess the sensitivity of three widely-used MedVLP methods to a variety of prompts across 15 different diseases. To achieve this, we designed six unique prompt styles to mirror real clinical scenarios, which were subsequently ranked by interpretability. Our findings indicate that all MedVLP models evaluated show unstable performance across different prompt styles, suggesting a lack…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInterpreting and Communication in Healthcare
