Zero-Shot Medical Phrase Grounding with Off-the-shelf Diffusion Models
Konstantinos Vilouras, Pedro Sanchez, Alison Q. O'Neil, Sotirios A., Tsaftaris

TL;DR
This paper demonstrates that off-the-shelf Latent Diffusion Models can be effectively used for zero-shot medical phrase grounding, localizing pathological regions in scans using only free-text reports without additional training.
Contribution
It introduces a novel zero-shot approach leveraging pre-trained diffusion models for medical phrase grounding, avoiding the need for task-specific training or annotations.
Findings
Competitive performance with state-of-the-art methods on chest X-ray datasets
Outperforms existing methods in mean IoU and AUC-ROC metrics
Shows the potential of generative models for medical localization tasks
Abstract
Localizing the exact pathological regions in a given medical scan is an important imaging problem that traditionally requires a large amount of bounding box ground truth annotations to be accurately solved. However, there exist alternative, potentially weaker, forms of supervision, such as accompanying free-text reports, which are readily available. The task of performing localization with textual guidance is commonly referred to as phrase grounding. In this work, we use a publicly available Foundation Model, namely the Latent Diffusion Model, to perform this challenging task. This choice is supported by the fact that the Latent Diffusion Model, despite being generative in nature, contains cross-attention mechanisms that implicitly align visual and textual features, thus leading to intermediate representations that are suitable for the task at hand. In addition, we aim to perform this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Advanced Text Analysis Techniques
MethodsLatent Diffusion Model · ALIGN · Diffusion
