M$^4$oE: A Foundation Model for Medical Multimodal Image Segmentation with Mixture of Experts
Yufeng Jiang, Yiqing Shen

TL;DR
The paper introduces M$^4$oE, a multimodal medical image segmentation model using a mixture of experts approach with a gating network, improving accuracy, interpretability, and efficiency across diverse datasets.
Contribution
It proposes a novel mixture of experts framework for medical multimodal segmentation that enhances generalization, interpretability, and computational efficiency.
Findings
Achieves up to 11.93% improvement over baseline models.
Reduces training time by 7 hours compared to similar methods.
Maintains only 30% of the parameters of comparable models.
Abstract
Medical imaging data is inherently heterogeneous across different modalities and clinical centers, posing unique challenges for developing generalizable foundation models. Conventional entails training distinct models per dataset or using a shared encoder with modality-specific decoders. However, these approaches incur heavy computational overheads and suffer from poor scalability. To address these limitations, we propose the Medical Multimodal Mixture of Experts (MoE) framework, leveraging the SwinUNet architecture. Specifically, MoE comprises modality-specific experts; each separately initialized to learn features encoding domain knowledge. Subsequently, a gating network is integrated during fine-tuning to modulate each expert's contribution to the collective predictions dynamically. This enhances model interpretability and generalization ability while retaining expertise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Medical Image Segmentation Techniques · AI in cancer detection
