3D Modality-Aware Pre-training for Vision-Language Model in MRI Multi-organ Abnormality Detection

Haowen Zhu; Ning Yin; Xiaogen Zhou

arXiv:2602.23652·cs.CV·March 4, 2026

3D Modality-Aware Pre-training for Vision-Language Model in MRI Multi-organ Abnormality Detection

Haowen Zhu, Ning Yin, Xiaogen Zhou

PDF

Open Access

TL;DR

MedMAP is a novel pretraining framework that improves vision-language alignment and feature fusion in 3D MRI for multi-organ abnormality detection, significantly outperforming existing models.

Contribution

Introduces MedMAP, a modality-aware pretraining approach specifically designed for 3D MRI vision-language tasks, addressing modality-specific challenges.

Findings

01

MedMAP outperforms existing VLMs on 3D MRI abnormality detection.

02

Curated MedMoM-MRI3D dataset with 7,392 MRI-volume report pairs.

03

Effective joint modality distribution capture during pretraining.

Abstract

Vision-language models (VLMs) show strong potential for complex diagnostic tasks in medical imaging. However, applying VLMs to multi-organ medical imaging introduces two principal challenges: (1) modality-specific vision-language alignment and (2) cross-modal feature fusion. In this work, we propose MedMAP, a Medical Modality-Aware Pretraining framework that enhances vision-language representation learning in 3D MRI. MedMAP comprises a modality-aware vision-language alignment stage and a fine-tuning stage for multi-organ abnormality detection. During the pre-training stage, the modality-aware encoders implicitly capture the joint modality distribution and improve alignment between visual and textual representations. We then fine-tune the pre-trained vision encoders (while keeping the text encoder frozen) for downstream tasks. To this end, we curated MedMoM-MRI3D, comprising 7,392 3D MRI…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Advanced Neural Network Applications