Mixture of Experts in Large Language Models
Danyang Zhang, Junhao Song, Ziqian Bi, Xinyuan Song, Yingfang Yuan, Tianyang Wang, Joe Yeong, Junfeng Hao

TL;DR
This paper reviews the Mixture-of-Experts architecture in large language models, emphasizing its ability to improve performance and scalability while discussing design, challenges, and future research directions.
Contribution
It provides a comprehensive analysis of MoE architectures, covering theoretical foundations, design choices, applications, and recent advances, highlighting key challenges and future directions.
Findings
MoE enhances model capacity with minimal computational overhead.
Expert diversity and calibration are crucial for effective MoE performance.
Recent advances address scalability and deployment challenges.
Abstract
This paper presents a comprehensive review of the Mixture-of-Experts (MoE) architecture in large language models, highlighting its ability to significantly enhance model performance while maintaining minimal computational overhead. Through a systematic analysis spanning theoretical foundations, core architectural designs, and large language model (LLM) applications, we examine expert gating and routing mechanisms, hierarchical and sparse MoE configurations, meta-learning approaches, multimodal and multitask learning scenarios, real-world deployment cases, and recent advances and challenges in deep learning. Our analysis identifies key advantages of MoE, including superior model capacity compared to equivalent Bayesian approaches, improved task-specific performance, and the ability to scale model capacity efficiently. We also underscore the importance of ensuring expert diversity,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExpert finding and Q&A systems
MethodsMixture of Experts
