MIRAGE: Retrieval and Generation of Multimodal Images and Texts for Medical Education

Miguel Diaz Benito; Cecilia Diana Albelda; Alvaro Garcia Martin; Jesus Bescos Cano; Marcos Escudero-Vinolo; Juan C. SanMiguel

arXiv:2605.04772·cs.CV·May 7, 2026

MIRAGE: Retrieval and Generation of Multimodal Images and Texts for Medical Education

Miguel Diaz Benito, Cecilia Diana Albelda, Alvaro Garcia Martin, Jesus Bescos Cano, Marcos Escudero-Vinolo, Juan C. SanMiguel

PDF

1 Repo

TL;DR

MIRAGE is a multimodal system that retrieves and generates medical images and texts, enhancing interactive learning for medical students by mapping data to a shared semantic space using publicly available models.

Contribution

It introduces a novel multimodal retrieval and generation system for medical education based on fine-tuned CLIP and diffusion models, accessible via Kaggle.

Findings

01

Enables retrieval of clinically relevant images from trustworthy sources.

02

Allows generation of synthetic medical images through prompts.

03

Provides enriched descriptions and visual comparisons of medical conditions.

Abstract

Access to diverse, well-annotated medical images with interactive learning tools is fundamental for training practitioners in medicine and related fields to improve their diagnostic skills and understanding of anatomical structures. While medical atlases are valuable, they are often impractical due to their size and lack of interactivity, whereas online image search may provide mislabeled or incomplete material. To address this, we propose MIRAGE, a multimodal medical text and image retrieval and generation system that allows users to find and generate clinically relevant images from trustworthy sources by mapping both text and images to a shared latent space, enabling semantically meaningful queries. The system is based on a fine-tuned medical version of CLIP (MedICaT-ROCO), trained with the ROCO dataset, obtained from PubMed Central. MIRAGE allows users to give prompts to retrieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

http://www-vpu.eps.uam.es/mirage
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.