Multi-modal Vision Pre-training for Medical Image Analysis
Shaohao Rui, Lingzhi Chen, Zhenyu Tang, Lilong Wang, Mianxin Liu,, Shaoting Zhang, Xiaosong Wang

TL;DR
This paper introduces a multi-modal vision pre-training approach for medical image analysis that leverages cross-modal correlations in multi-parametric MRI scans, significantly improving performance on various downstream tasks.
Contribution
It proposes a novel multi-modal pre-training framework with three proxy tasks to learn cross-modality representations from large-scale brain MRI data, addressing limitations of uni-modal self-supervision.
Findings
Achieved Dice Score improvements of 0.28%-14.47% across six segmentation benchmarks.
Realized accuracy boosts of 0.65%-18.07% in four image classification tasks.
Demonstrated superior performance over state-of-the-art pre-training methods.
Abstract
Self-supervised learning has greatly facilitated medical image analysis by suppressing the training data requirement for real-world applications. Current paradigms predominantly rely on self-supervision within uni-modal image data, thereby neglecting the inter-modal correlations essential for effective learning of cross-modal image representations. This limitation is particularly significant for naturally grouped multi-modal data, e.g., multi-parametric MRI scans for a patient undergoing various functional imaging protocols in the same study. To bridge this gap, we conduct a novel multi-modal image pre-training with three proxy tasks to facilitate the learning of cross-modality representations and correlations using multi-modal brain MRI scans (over 2.4 million images in 16,022 scans of 3,755 patients), i.e., cross-modal image reconstruction, modality-aware contrastive learning, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBrain Tumor Detection and Classification · Medical Imaging and Analysis
MethodsContrastive Learning
