Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

Yunxin Li; Shenyuan Jiang; Baotian Hu; Longyue Wang; Wanqi Zhong,; Wenhan Luo; Lin Ma; and Min Zhang

arXiv:2405.11273·cs.AI·May 21, 2024·3 cites

Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts

Yunxin Li, Shenyuan Jiang, Baotian Hu, Longyue Wang, Wanqi Zhong,, Wenhan Luo, Lin Ma, and Min Zhang

PDF

Open Access 1 Repo

TL;DR

Uni-MoE introduces a scalable, unified multimodal large language model architecture using Mixture of Experts, enabling efficient training and inference across diverse modalities with improved generalization and reduced bias.

Contribution

This work pioneers the development of a unified multimodal LLM with MoE architecture, incorporating modality-specific encoders, a progressive training strategy, and extensive evaluation.

Findings

01

Significantly reduces performance bias across modalities

02

Enhances multi-expert collaboration and generalization

03

Demonstrates effective handling of diverse multimodal datasets

Abstract

Recent advancements in Multimodal Large Language Models (MLLMs) underscore the significance of scalable models and data to boost performance, yet this often incurs substantial computational costs. Although the Mixture of Experts (MoE) architecture has been employed to efficiently scale large language and image-text models, these efforts typically involve fewer experts and limited modalities. To address this, our work presents the pioneering attempt to develop a unified MLLM with the MoE architecture, named Uni-MoE that can handle a wide array of modalities. Specifically, it features modality-specific encoders with connectors for a unified multimodal representation. We also implement a sparse MoE architecture within the LLMs to enable efficient training and inference through modality-level data parallelism and expert-level model parallelism. To enhance the multi-expert collaboration and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hitsz-tmg/umoe-scaling-unified-multimodal-llms
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems

MethodsSparse Evolutionary Training · Mixture of Experts