DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts

Yuchen Feng; Bowen Shen; Naibin Gu; Jiaxuan Zhao; Peng Fu; Zheng Lin; Weiping Wang

arXiv:2506.09351·cs.CL·June 12, 2025

DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts

Yuchen Feng, Bowen Shen, Naibin Gu, Jiaxuan Zhao, Peng Fu, Zheng Lin, Weiping Wang

PDF

Open Access 1 Video

TL;DR

This paper introduces DIVE, a novel method for reconstructing large language models into Mixture-of-Experts architectures that emphasizes expert diversity, reducing training costs while maintaining high accuracy.

Contribution

The paper proposes a diversity-aware reconstruction approach called DIVE, incorporating domain affinity, pruning, and efficient retraining to improve MoE model reconstruction from dense LLMs.

Findings

01

DIVE outperforms existing methods in training efficiency.

02

DIVE maintains high accuracy with minimal trade-offs.

03

DIVE effectively enhances expert diversity during reconstruction.

Abstract

Large language models (LLMs) with the Mixture-of-Experts (MoE) architecture achieve high cost-efficiency by selectively activating a subset of the parameters. Despite the inference efficiency of MoE LLMs, the training of extensive experts from scratch incurs substantial overhead, whereas reconstructing a dense LLM into an MoE LLM significantly reduces the training budget. However, existing reconstruction methods often overlook the diversity among experts, leading to potential redundancy. In this paper, we come up with the observation that a specific LLM exhibits notable diversity after being pruned on different calibration datasets, based on which we present a Diversity-Enhanced reconstruction method named DIVE. The recipe of DIVE includes domain affinity mining, pruning-based expert reconstruction, and efficient retraining. Specifically, the reconstruction includes pruning and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DIVE into MoE: Diversity-Enhanced Reconstruction of Large Language Models from Dense into Mixture-of-Experts· underline

Taxonomy

TopicsTopic Modeling · Computational and Text Analysis Methods · Domain Adaptation and Few-Shot Learning