Bringing CLIP to the Clinic: Dynamic Soft Labels and Negation-Aware Learning for Medical Analysis
Hanbin Ko, Chang-Min Park

TL;DR
This paper enhances medical vision-language models by integrating dynamic soft labels, negation-aware learning, and graphical alignment, significantly improving clinical language understanding and performance in medical imaging tasks.
Contribution
It introduces a novel framework combining dynamic soft labels, negation-based negatives, and graphical alignment to improve medical CLIP models' understanding of clinical language.
Findings
Achieved state-of-the-art results in zero-shot and fine-tuned classification.
Demonstrated improved understanding of negation and clinical details.
Generalized across multiple contrastive learning frameworks.
Abstract
The development of large-scale image-text pair datasets has significantly advanced self-supervised learning in Vision-Language Processing (VLP). However, directly applying general-domain architectures such as CLIP to medical data presents challenges, particularly in handling negations and addressing the inherent data imbalance of medical datasets. To address these issues, we propose a novel approach that integrates clinically-enhanced dynamic soft labels and medical graphical alignment, thereby improving clinical comprehension and the applicability of contrastive loss in medical contexts. Furthermore, we introduce negation-based hard negatives to deepen the model's understanding of the complexities of clinical language. Our approach is easily integrated into the medical CLIP training pipeline and achieves state-of-the-art performance across multiple tasks, including zero-shot,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
