MedMoE: Modality-Specialized Mixture of Experts for Medical Vision-Language Understanding
Shivang Chopra, Gabriela Sanchez-Rodriguez, Lingchao Mao, Andrew J Feola, Jing Li, Zsolt Kira

TL;DR
MedMoE introduces a modality-specific mixture of experts framework that dynamically adapts visual representations for medical vision-language tasks, improving alignment and retrieval across diverse imaging modalities.
Contribution
The paper proposes MedMoE, a novel modular framework with a Mixture-of-Experts module conditioned on report type, enabling modality-specific visual feature extraction without additional supervision.
Findings
Enhanced alignment and retrieval performance across multiple medical imaging modalities.
Effective spatially adaptive attention to clinically relevant regions.
Improved generalization in medical vision-language understanding.
Abstract
Different medical imaging modalities capture diagnostic information at varying spatial resolutions, from coarse global patterns to fine-grained localized structures. However, most existing vision-language frameworks in the medical domain apply a uniform strategy for local feature extraction, overlooking the modality-specific demands. In this work, we present MedMoE, a modular and extensible vision-language processing framework that dynamically adapts visual representation based on the diagnostic context. MedMoE incorporates a Mixture-of-Experts (MoE) module conditioned on the report type, which routes multi-scale image features through specialized expert branches trained to capture modality-specific visual semantics. These experts operate over feature pyramids derived from a Swin Transformer backbone, enabling spatially adaptive attention to clinically relevant regions. This framework…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling
MethodsLinear Layer · Dense Connections · Stochastic Depth · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Adam · Attention Is All You Need · Softmax · Swin Transformer · Label Smoothing
