TL;DR
This paper introduces Director-Experts (DEX), a modular network for multi-modality medical vision models that improves representation learning across diverse imaging modalities by balancing specialization and coordination.
Contribution
The work proposes DEX, a novel modular architecture with dynamic experts and a director, and curates a new large-scale benchmark for multi-modality medical imaging pre-training.
Findings
DEX improves optimization and transferability on 26 downstream tasks.
Curated Medical Vision Universe benchmark with over 4 million images across 10 modalities.
Demonstrates the emergence of modular representations in multi-modality medical vision models.
Abstract
Multi-modality medical vision (MV) foundation models (FM) are fundamentally challenged by pronounced Non-IID feature statistics across heterogeneous imaging modalities. Monolithic self-supervised optimization on such data induces conflicting gradients, driving representations to collapse toward modality-dominant shortcuts. This work reframes this failure as an imbalance between specialization and coordination in emergent modularity, and proposes Director-Experts (DEX), a modular network that explicitly regulates these dynamics in stacked modules. Each DEX module comprises a pool of experts, dynamically adapted by our image-wise activation strategy, autonomously specializing in modality-dominant statistics, together with a director, updated via our group exponential moving average, which distills multi-expert knowledge into a shared space for semantic integration across modalities, thus…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
