The Rise of Sparse Mixture-of-Experts: A Survey from Algorithmic Foundations to Decentralized Architectures and Vertical Domain Applications
Dong Pan, Bingtao Li, Yongsheng Zheng, Jiren Ma, Victor Fei

TL;DR
This survey comprehensively reviews the development, principles, decentralized architectures, and diverse applications of sparse Mixture-of-Experts models, highlighting recent advancements and future research directions in the field.
Contribution
It provides the most extensive overview of MoE, including foundational principles, decentralized paradigms, and vertical domain applications, filling gaps left by previous surveys.
Findings
Decentralized MoE models enable greater scalability and democratization.
MoE models significantly improve efficiency in various AI domains.
The survey identifies key challenges and future directions for MoE research.
Abstract
The sparse Mixture of Experts(MoE) architecture has evolved as a powerful approach for scaling deep learning models to more parameters with comparable computation cost. As an important branch of large language model(LLM), MoE model only activate a subset of experts based on a routing network. This sparse conditional computation mechanism significantly improves computational efficiency, paving a promising path for greater scalability and cost-efficiency. It not only enhance downstream applications such as natural language processing, computer vision, and multimodal in various horizontal domains, but also exhibit broad applicability across vertical domains. Despite the growing popularity and application of MoE models across various domains, there lacks a systematic exploration of recent advancements of MoE in many important fields. Existing surveys on MoE suffer from limitations such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Mobile Crowdsensing and Crowdsourcing · Human Mobility and Location-Based Analysis
