Gene-induced Multimodal Pre-training for Image-omic Classification
Ting Jin, Xingran Xie, Renjie Wan, Qingli Li, Yan Wang

TL;DR
This paper introduces a novel multimodal pre-training framework that effectively combines genomic data and histology images for cancer classification, overcoming challenges in feature extraction and high-order relevance modeling.
Contribution
It proposes a new gene-induced multimodal pre-training framework with a gene encoder, masked patch modeling, and triplet learning for improved image-omic classification.
Findings
Achieved 99.47% accuracy on TCGA dataset
Demonstrated superiority over existing methods
Validated effectiveness of the proposed architecture
Abstract
Histology analysis of the tumor micro-environment integrated with genomic assays is the gold standard for most cancers in modern medicine. This paper proposes a Gene-induced Multimodal Pre-training (GiMP) framework, which jointly incorporates genomics and Whole Slide Images (WSIs) for classification tasks. Our work aims at dealing with the main challenges of multi-modality image-omic classification w.r.t. (1) the patient-level feature extraction difficulties from gigapixel WSIs and tens of thousands of genes, and (2) effective fusion considering high-order relevance modeling. Concretely, we first propose a group multi-head self-attention gene encoder to capture global structured features in gene expression cohorts. We design a masked patch modeling paradigm (MPM) to capture the latent pathological characteristics of different tissues. The mask strategy is randomly masking a fixed-length…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Cell Image Analysis Techniques · Gene expression and cancer classification
