Multi-modal Vision Pre-training for Medical Image Analysis

Shaohao Rui; Lingzhi Chen; Zhenyu Tang; Lilong Wang; Mianxin Liu,; Shaoting Zhang; Xiaosong Wang

arXiv:2410.10604·cs.CV·March 31, 2025

Multi-modal Vision Pre-training for Medical Image Analysis

Shaohao Rui, Lingzhi Chen, Zhenyu Tang, Lilong Wang, Mianxin Liu,, Shaoting Zhang, Xiaosong Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multi-modal vision pre-training approach for medical image analysis that leverages cross-modal correlations in multi-parametric MRI scans, significantly improving performance on various downstream tasks.

Contribution

It proposes a novel multi-modal pre-training framework with three proxy tasks to learn cross-modality representations from large-scale brain MRI data, addressing limitations of uni-modal self-supervision.

Findings

01

Achieved Dice Score improvements of 0.28%-14.47% across six segmentation benchmarks.

02

Realized accuracy boosts of 0.65%-18.07% in four image classification tasks.

03

Demonstrated superior performance over state-of-the-art pre-training methods.

Abstract

Self-supervised learning has greatly facilitated medical image analysis by suppressing the training data requirement for real-world applications. Current paradigms predominantly rely on self-supervision within uni-modal image data, thereby neglecting the inter-modal correlations essential for effective learning of cross-modal image representations. This limitation is particularly significant for naturally grouped multi-modal data, e.g., multi-parametric MRI scans for a patient undergoing various functional imaging protocols in the same study. To bridge this gap, we conduct a novel multi-modal image pre-training with three proxy tasks to facilitate the learning of cross-modality representations and correlations using multi-modal brain MRI scans (over 2.4 million images in 16,022 scans of 3,755 patients), i.e., cross-modal image reconstruction, modality-aware contrastive learning, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shaohao011/BrainMVP
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBrain Tumor Detection and Classification · Medical Imaging and Analysis

MethodsContrastive Learning