M-IDoL: Information Decomposition for Modality-Specific and Diverse Representation Learning in Medical Foundation Model

Yihang Liu; Longzhen Yang; Jiaxiong Yang; Ying Wen; Lianghua He; Heng Tao Shen

arXiv:2604.08936·cs.CV·May 19, 2026

M-IDoL: Information Decomposition for Modality-Specific and Diverse Representation Learning in Medical Foundation Model

Yihang Liu, Longzhen Yang, Jiaxiong Yang, Ying Wen, Lianghua He, Heng Tao Shen

PDF

TL;DR

M-IDoL is a self-supervised medical foundation model that enhances modality-specific and diverse representations by decomposing multimodal information, leading to improved generalization across clinical tasks.

Contribution

It introduces an information decomposition approach with two objectives to improve modality specificity and diversity in medical multimodal representation learning.

Findings

01

Outperforms 20 foundation models on 21 clinical tasks

02

Learns clearer separation of features across modalities

03

Enriches intra-modality feature discrimination

Abstract

Medical foundation models (MFMs) aim to learn universal representations from multimodal medical images that can generalize effectively to diverse downstream clinical tasks. However, most existing MFMs suffer from information ambiguity that blends multimodal representations in a single embedding space, leading to the degradation of modality specificity and diversity. In this paper, we propose M-IDoL, a self-supervised MFM that introduces Information Decomposition for multimodal representation Learning via two objectives: i) maximizing inter-modality entropy by dispersing multimodal representations into separable Mixture-of-Experts (MoE) subspaces to achieve representation specificity across modalities; and ii) minimizing intra-modality uncertainty by performing fine-grained semantic discrimination within each MoE subspace to enrich representation diversity per modality. By pre-training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.