MGI: Multimodal Contrastive pre-training of Genomic and Medical Imaging
Jiaying Zhou, Mingzhou Jiang, Junde Wu, Jiayuan Zhu, Ziyue Wang,, Yueming Jin

TL;DR
This paper introduces MGI, a multimodal contrastive pre-training framework that jointly models genomic and medical imaging data, significantly improving downstream tumor segmentation performance.
Contribution
The paper presents a novel multimodal pre-training approach combining genomic and imaging data using contrastive learning and specialized encoders, addressing previous unimodal limitations.
Findings
Outperforms existing methods on tumor segmentation tasks
Effectively models long genomic sequences with Mamba encoder
Demonstrates the benefit of joint genomic and imaging pre-training
Abstract
Medicine is inherently a multimodal discipline. Medical images can reflect the pathological changes of cancer and tumors, while the expression of specific genes can influence their morphological characteristics. However, most deep learning models employed for these medical tasks are unimodal, making predictions using either image data or genomic data exclusively. In this paper, we propose a multimodal pre-training framework that jointly incorporates genomics and medical images for downstream tasks. To address the issues of high computational complexity and difficulty in capturing long-range dependencies in genes sequence modeling with MLP or Transformer architectures, we utilize Mamba to model these long genomic sequences. We aligns medical images and genes using a self-supervised contrastive learning approach which combines the Mamba as a genetic encoder and the Vision Transformer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetics, Bioinformatics, and Biomedical Research
MethodsAttention Is All You Need · Softmax · Layer Normalization · Contrastive Learning · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer
