Anatomy-Grounded Weakly Supervised Prompt Tuning for Chest X-ray Latent Diffusion Models
Konstantinos Vilouras, Ilias Stogiannidis, Junyu Yan, Alison Q. O'Neil, Sotirios A. Tsaftaris

TL;DR
This paper introduces a fine-tuning approach for latent diffusion models to improve alignment between radiology reports and chest X-ray images, enabling better medical image understanding and outperforming previous methods on benchmark datasets.
Contribution
It presents a novel anatomy-grounded weakly supervised prompt tuning method for latent diffusion models in medical imaging, enhancing multi-modal alignment for chest X-ray analysis.
Findings
Sets new state-of-the-art on MS-CXR benchmark
Demonstrates robust out-of-distribution performance
Improves alignment between reports and images
Abstract
Latent Diffusion Models have shown remarkable results in text-guided image synthesis in recent years. In the domain of natural (RGB) images, recent works have shown that such models can be adapted to various vision-language downstream tasks with little to no supervision involved. On the contrary, text-to-image Latent Diffusion Models remain relatively underexplored in the field of medical imaging, primarily due to limited data availability (e.g., due to privacy concerns). In this work, focusing on the chest X-ray modality, we first demonstrate that a standard text-conditioned Latent Diffusion Model has not learned to align clinically relevant information in free-text radiology reports with the corresponding areas of the given scan. Then, to alleviate this issue, we propose a fine-tuning framework to improve multi-modal alignment in a pre-trained model such that it can be efficiently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsDiffusion · ALIGN · Latent Diffusion Model
