Disentangled Multi-modal Learning of Histology and Transcriptomics for Cancer Characterization
Yupei Zhang, Xiaofei Wang, Anran Liu, Lequan Yu, Chao Li

TL;DR
This paper introduces a novel disentangled multi-modal learning framework that combines histology and transcriptomics data for improved cancer diagnosis and prognosis, addressing heterogeneity, multi-scale integration, and data dependency issues.
Contribution
The proposed framework decomposes data into tumor and microenvironment subspaces, aligns multi-scale transcriptomic signals, enables transcriptome-agnostic inference, and enhances efficiency with token aggregation.
Findings
Outperforms state-of-the-art methods in cancer diagnosis and prognosis
Demonstrates robustness across multiple experimental settings
Reduces reliance on paired data for multi-modal learning
Abstract
Histopathology remains the gold standard for cancer diagnosis and prognosis. With the advent of transcriptome profiling, multi-modal learning combining transcriptomics with histology offers more comprehensive information. However, existing multi-modal approaches are challenged by intrinsic multi-modal heterogeneity, insufficient multi-scale integration, and reliance on paired data, restricting clinical applicability. To address these challenges, we propose a disentangled multi-modal framework with four contributions: 1) To mitigate multi-modal heterogeneity, we decompose WSIs and transcriptomes into tumor and microenvironment subspaces using a disentangled multi-modal fusion module, and introduce a confidence-guided gradient coordination strategy to balance subspace optimization. 2) To enhance multi-scale integration, we propose an inter-magnification gene-expression consistency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
