Disentangled Multi-modal Learning of Histology and Transcriptomics for Cancer Characterization

Yupei Zhang; Xiaofei Wang; Anran Liu; Lequan Yu; Chao Li

arXiv:2508.16479·eess.IV·March 3, 2026

Disentangled Multi-modal Learning of Histology and Transcriptomics for Cancer Characterization

Yupei Zhang, Xiaofei Wang, Anran Liu, Lequan Yu, Chao Li

PDF

TL;DR

This paper introduces a novel disentangled multi-modal learning framework that combines histology and transcriptomics data for improved cancer diagnosis and prognosis, addressing heterogeneity, multi-scale integration, and data dependency issues.

Contribution

The proposed framework decomposes data into tumor and microenvironment subspaces, aligns multi-scale transcriptomic signals, enables transcriptome-agnostic inference, and enhances efficiency with token aggregation.

Findings

01

Outperforms state-of-the-art methods in cancer diagnosis and prognosis

02

Demonstrates robustness across multiple experimental settings

03

Reduces reliance on paired data for multi-modal learning

Abstract

Histopathology remains the gold standard for cancer diagnosis and prognosis. With the advent of transcriptome profiling, multi-modal learning combining transcriptomics with histology offers more comprehensive information. However, existing multi-modal approaches are challenged by intrinsic multi-modal heterogeneity, insufficient multi-scale integration, and reliance on paired data, restricting clinical applicability. To address these challenges, we propose a disentangled multi-modal framework with four contributions: 1) To mitigate multi-modal heterogeneity, we decompose WSIs and transcriptomes into tumor and microenvironment subspaces using a disentangled multi-modal fusion module, and introduce a confidence-guided gradient coordination strategy to balance subspace optimization. 2) To enhance multi-scale integration, we propose an inter-magnification gene-expression consistency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.