TL;DR
KEPIL enhances zero-shot radiology disease detection by integrating medical knowledge and robust prompts, improving stability and performance across multiple benchmarks.
Contribution
The paper introduces KEPIL, a novel framework combining knowledge integration and prompt engineering to improve clinical robustness of vision-language models.
Findings
Achieves state-of-the-art zero-shot inference performance on seven benchmarks.
Improves AUC by 6.37% on CheXpert under prompt variation.
Demonstrates the importance of structured knowledge and prompt design for reliable clinical VLMs.
Abstract
Vision--language models (VLMs) show promise for clinical decision support in radiology because they enable joint reasoning over radiological images and clinical text, thereby leveraging complementary clinical information. However, radiological findings are long-tailed in practice, leaving some conditions underrepresented and making zero-shot inference essential. Yet current CLIP-style medical VLMs are sensitive to prompt variations and often lack trustworthy external knowledge at inference time, which hinders reliable clinical deployment. We present \textit{KEPIL}, a prompt-robust framework that integrates curated medical knowledge to stabilize zero-shot generalization. KEPIL comprises: (i) \emph{dynamic prompt enrichment} using ontologies with LLM assistance, (ii) a \emph{semantic-aware contrastive loss} aligning embeddings of equivalent prompt variants via a dual-embedding objective,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
