Can Medical Vision-Language Pre-training Succeed with Purely Synthetic Data?
Che Liu, Zhongwei Wan, Haozhe Wang, Yinda Chen, Talha Qaiser, Chen, Jin, Fariba Yousefi, Nikolay Burlutskiy, Rossella Arcucci

TL;DR
This study demonstrates that medical vision-language pre-training models can effectively be trained solely on synthetic data generated by off-the-shelf models, outperforming real data in zero-shot tasks and improving overall performance.
Contribution
The paper introduces a pipeline for creating high-quality synthetic medical image-text datasets and shows that models trained on this synthetic data can surpass those trained on real data in various tasks.
Findings
Synthetic data-trained models outperform real data-trained models in zero-shot classification by 3.8%.
Combining synthetic and real data improves performance by 9.07%.
Models trained on synthetic data excel in zero-shot grounding, classification, and segmentation.
Abstract
Medical Vision-Language Pre-training (MedVLP) has made significant progress in enabling zero-shot tasks for medical image understanding. However, training MedVLP models typically requires large-scale datasets with paired, high-quality image-text data, which are scarce in the medical domain. Recent advancements in Large Language Models (LLMs) and diffusion models have made it possible to generate large-scale synthetic image-text pairs. This raises the question: "Can MedVLP succeed using purely synthetic data?" To address this, we use off-the-shelf generative models to create synthetic radiology reports and paired Chest X-ray (CXR) images, and propose an automated pipeline to build a diverse, high-quality synthetic dataset, enabling a rigorous study that isolates model and training settings, focusing entirely from the data perspective. Our results show that MedVLP models trained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Natural Language Processing Techniques
MethodsDiffusion
