Uni-Med: A Unified Medical Generalist Foundation Model For Multi-Task Learning Via Connector-MoE
Xun Zhu, Ying Hu, Fanbin Mo, Miao Li, Ji Wu

TL;DR
Uni-Med is a novel medical generalist foundation model that effectively addresses multi-task interference in multi-modal learning by introducing a connector mixture-of-experts, enabling it to perform diverse medical tasks with improved efficiency and accuracy.
Contribution
The paper presents Uni-Med, the first model to specifically tackle multi-task interference at the connector in multi-modal medical language models using a mixture-of-experts approach.
Findings
Achieves up to 8% performance improvement with CMoE.
Successfully performs six diverse medical tasks.
Outperforms previous state-of-the-art medical MLLMs.
Abstract
Multi-modal large language models (MLLMs) have shown impressive capabilities as a general-purpose interface for various visual and linguistic tasks. However, building a unified MLLM for multi-task learning in the medical field remains a thorny challenge. To mitigate the tug-of-war problem of multi-modal multi-task optimization in MLLMs, recent advances primarily focus on improving the LLM components, while neglecting the connector that bridges the gap between modalities. In this paper, we introduce Uni-Med, a novel medical generalist foundation model which consists of a universal visual feature extraction module, a connector mixture-of-experts (CMoE) module, and an LLM. Benefiting from the proposed CMoE that leverages a well-designed router with a mixture of projection experts at the connector, Uni-Med achieves efficient solution to the tug-of-war problem and can perform six different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMachine Learning in Healthcare
MethodsFocus
