A Survey on Mixture of Experts in Large Language Models
Weilin Cai, Juyong Jiang, Fan Wang, Jing Tang, Sunghun Kim, Jiayi, Huang

TL;DR
This survey comprehensively reviews the structure, design, applications, and future directions of mixture of experts (MoE) models in large language models, providing a valuable resource for researchers in this rapidly evolving field.
Contribution
It offers a new taxonomy of MoE, summarizes core design principles, and compiles open-source implementations and empirical evaluations, filling a gap in systematic literature review.
Findings
Provides a taxonomy of MoE models
Summarizes core design and systemic aspects
Lists open-source implementations and empirical results
Abstract
Large language models (LLMs) have garnered unprecedented advancements across diverse fields, ranging from natural language processing to computer vision and beyond. The prowess of LLMs is underpinned by their substantial model size, extensive and diverse datasets, and the vast computational power harnessed during training, all of which contribute to the emergent abilities of LLMs (e.g., in-context learning) that are not present in small models. Within this context, the mixture of experts (MoE) has emerged as an effective method for substantially scaling up model capacity with minimal computation overhead, gaining significant attention from academia and industry. Despite its growing prevalence, there lacks a systematic and comprehensive review of the literature on MoE. This survey seeks to bridge that gap, serving as an essential resource for researchers delving into the intricacies of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExpert finding and Q&A systems · Facility Location and Emergency Management
MethodsSoftmax · Attention Is All You Need · Mixture of Experts
