Advancing AI-Powered Medical Image Synthesis: Insights from MedVQA-GI Challenge Using CLIP, Fine-Tuned Stable Diffusion, and Dream-Booth + LoRA

Ojonugwa Oluwafemi Ejiga Peter; Md Mahmudur Rahman; and Fahmi Khalifa

arXiv:2502.20667·cs.CV·August 12, 2025

Advancing AI-Powered Medical Image Synthesis: Insights from MedVQA-GI Challenge Using CLIP, Fine-Tuned Stable Diffusion, and Dream-Booth + LoRA

Ojonugwa Oluwafemi Ejiga Peter, Md Mahmudur Rahman, and Fahmi Khalifa

PDF

TL;DR

This paper presents a novel AI system that uses fine-tuned diffusion models and prompt optimization to generate high-quality, diverse medical images from text, improving diagnostic tools and addressing limitations of traditional methods.

Contribution

It introduces a new approach combining Stable Diffusion, DreamBooth, and LoRA for dynamic, scalable medical image synthesis from text prompts, surpassing previous models in quality and diversity.

Findings

01

Stable Diffusion outperforms CLIP and DreamBooth + LORA in image quality.

02

Achieved low Fréchet Inception Distance scores indicating high fidelity.

03

Generated diverse images with high Inception Scores.

Abstract

The MEDVQA-GI challenge addresses the integration of AI-driven text-to-image generative models in medical diagnostics, aiming to enhance diagnostic capabilities through synthetic image generation. Existing methods primarily focus on static image analysis and lack the dynamic generation of medical imagery from textual descriptions. This study intends to partially close this gap by introducing a novel approach based on fine-tuned generative models to generate dynamic, scalable, and precise images from textual descriptions. Particularly, our system integrates fine-tuned Stable Diffusion and DreamBooth models, as well as Low-Rank Adaptation (LORA), to generate high-fidelity medical images. The problem is around two sub-tasks namely: image synthesis (IS) and optimal prompt production (OPG). The former creates medical images via verbal prompts, whereas the latter provides prompts that produce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.