The Effect of Negation on CLIP in Medical Imaging: Limitations of Contrastive Language-Image Pretraining
Jasmine Vu, Shivanand Sheshappanavar

TL;DR
This paper investigates the limitations of CLIP in understanding negation in medical imaging prompts, evaluates a specific model's performance, and explores fine-tuning methods to improve negation handling for clinical applications.
Contribution
It provides a detailed analysis of CLIP's failure modes with negated medical prompts and proposes fine-tuning strategies to enhance its interpretative accuracy.
Findings
Improved negation handling with fine-tuning
Slight decrease in positive prompt accuracy
Insights into model's internal representations
Abstract
Large vision-language models like CLIP are increasingly used in medical imaging tasks due to their ability to align images and text without the need for extensive labeled data. This makes them particularly useful for applications like image retrieval, report generation, and classification in clinical settings. A potential issue to this approach is that CLIP-based models often under perform when interpreting negated phrases, which is especially problematic in the context of medical diagnosing. In this study, we evaluate the Stanford AIMI CheXagent model on its ability to correctly retrieve chest X-ray images using prompts with and without negation. The goal of this project is to understand where this model fails and then use it as a base model to improve its retrieval accuracy by fine tuning methods outlined in previous work. Results from this study show improvement in handling of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning
