Adapting Pretrained Vision-Language Foundational Models to Medical Imaging Domains
Pierre Chambon, Christian Bluethgen, Curtis P. Langlotz, Akshay, Chaudhari

TL;DR
This paper adapts large pretrained vision-language models, specifically Stable Diffusion, to generate and manipulate medical images, addressing domain shift issues and improving clinical relevance through fine-tuning and evaluation.
Contribution
It demonstrates how to fine-tune the Stable Diffusion model for medical imaging, enabling realistic abnormality insertion while preserving diagnostic features.
Findings
Improved image quality metrics over baseline models
Radiologist evaluations confirm clinical relevance
Model maintains 95% abnormality detection accuracy
Abstract
Multi-modal foundation models are typically trained on millions of pairs of natural images and text captions, frequently obtained through web-crawling approaches. Although such models depict excellent generative capabilities, they do not typically generalize well to specific domains such as medical images that have fundamentally shifted distributions compared to natural images. Building generative models for medical images that faithfully depict clinical context may help alleviate the paucity of healthcare datasets. Thus, in this study, we seek to research and expand the representational capabilities of large pretrained foundation models to medical concepts, specifically for leveraging the Stable Diffusion model to generate domain specific images found in medical imaging. We explore the sub-components of the Stable Diffusion pipeline (the variational autoencoder, the U-Net and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · AI in cancer detection · Colorectal Cancer Screening and Detection
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Max Pooling · Convolution · U-Net · Diffusion
