LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training
Xiaoye Qu, Daize Dong, Xuyang Hu, Tong Zhu, Weigao Sun, Yu Cheng

TL;DR
This paper investigates the sparsity of LLaMA models by constructing Mixture-of-Experts (MoE) modules in transformer blocks, evaluating their performance across domains, and proposing a post-training strategy to mitigate sparsity-induced performance loss.
Contribution
It introduces a comprehensive analysis of sparsifying LLaMA with MoE, explores expert construction methods, and proposes a two-stage post-training strategy to improve performance after sparsification.
Findings
Sparsified LLaMA models maintain competitive performance across domains.
Different expert construction methods impact sparsity and effectiveness.
Post-training strategies can recover performance degraded by sparsity.
Abstract
Recently, inspired by the concept of sparsity, Mixture-of-Experts (MoE) models have gained increasing popularity for scaling model size while keeping the number of activated parameters constant. In this study, we thoroughly investigate the sparsity of the dense LLaMA model by constructing MoE for both the attention (i.e., Attention MoE) and MLP (i.e., MLP MoE) modules in the transformer blocks. Specifically, we investigate different expert construction methods and granularities under the same activation conditions to analyze the impact of sparsifying the model. Additionally, to comprehensively evaluate the model's capabilities across various domains (e.g., conversation, code, math) after sparsification, we apply sparsity to the instructed large language models (LLMs) and construct instructed MoE models. To counteract the performance degradation resulting from increased sparsity, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Computability, Logic, AI Algorithms · Topic Modeling
MethodsSoftmax · Attention Is All You Need · Mixture of Experts · LLaMA
