M-IDoL: Information Decomposition for Modality-Specific and Diverse Representation Learning in Medical Foundation Model
Yihang Liu, Longzhen Yang, Jiaxiong Yang, Ying Wen, Lianghua He, Heng Tao Shen

TL;DR
M-IDoL is a self-supervised medical foundation model that enhances modality-specific and diverse representations by decomposing multimodal information, leading to improved generalization across clinical tasks.
Contribution
It introduces an information decomposition approach with two objectives to improve modality specificity and diversity in medical multimodal representation learning.
Findings
Outperforms 20 foundation models on 21 clinical tasks
Learns clearer separation of features across modalities
Enriches intra-modality feature discrimination
Abstract
Medical foundation models (MFMs) aim to learn universal representations from multimodal medical images that can generalize effectively to diverse downstream clinical tasks. However, most existing MFMs suffer from information ambiguity that blends multimodal representations in a single embedding space, leading to the degradation of modality specificity and diversity. In this paper, we propose M-IDoL, a self-supervised MFM that introduces Information Decomposition for multimodal representation Learning via two objectives: i) maximizing inter-modality entropy by dispersing multimodal representations into separable Mixture-of-Experts (MoE) subspaces to achieve representation specificity across modalities; and ii) minimizing intra-modality uncertainty by performing fine-grained semantic discrimination within each MoE subspace to enrich representation diversity per modality. By pre-training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
