Learning Emergent Modular Representations in Multi-modality Medical Vision Foundation Models

Yuting He; Chenyu You; Shuo Li

arXiv:2605.21861·cs.CV·May 22, 2026

Learning Emergent Modular Representations in Multi-modality Medical Vision Foundation Models

Yuting He, Chenyu You, Shuo Li

PDF

1 Repo

TL;DR

This paper introduces Director-Experts (DEX), a modular network for multi-modality medical vision models that improves representation learning across diverse imaging modalities by balancing specialization and coordination.

Contribution

The work proposes DEX, a novel modular architecture with dynamic experts and a director, and curates a new large-scale benchmark for multi-modality medical imaging pre-training.

Findings

01

DEX improves optimization and transferability on 26 downstream tasks.

02

Curated Medical Vision Universe benchmark with over 4 million images across 10 modalities.

03

Demonstrates the emergence of modular representations in multi-modality medical vision models.

Abstract

Multi-modality medical vision (MV) foundation models (FM) are fundamentally challenged by pronounced Non-IID feature statistics across heterogeneous imaging modalities. Monolithic self-supervised optimization on such data induces conflicting gradients, driving representations to collapse toward modality-dominant shortcuts. This work reframes this failure as an imbalance between specialization and coordination in emergent modularity, and proposes Director-Experts (DEX), a modular network that explicitly regulates these dynamics in stacked modules. Each DEX module comprises a pool of experts, dynamically adapted by our image-wise activation strategy, autonomously specializing in modality-dominant statistics, together with a director, updated via our group exponential moving average, which distills multi-expert knowledge into a shared space for semantic integration across modalities, thus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

YutingHe-list/DEX
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.