Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts

Shengzhuang Chen; Ying Wei; Jonathan Richard Schwarz

arXiv:2506.12597·cs.LG·June 17, 2025

Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts

Shengzhuang Chen, Ying Wei, Jonathan Richard Schwarz

PDF

Open Access 1 Video

TL;DR

This paper introduces SIMoE, a novel instruction-tuning method that converts dense LLMs into sparse Mixture-of-Experts models, automatically discovering specialized experts and improving downstream task performance.

Contribution

The paper proposes SIMoE, an end-to-end algorithm for automatic expert discovery and input-dependent expert routing in LLMs, enhancing efficiency and generalization.

Findings

01

Achieves state-of-the-art results on instruction-tuning benchmarks.

02

Maintains optimal performance-compute trade-off.

03

Automatically identifies domain-specific experts within LLMs.

Abstract

We present Sparse Interpolated Mixture-of-Experts (SIMoE) instruction-tuning, an end-to-end algorithm designed to fine-tune a dense pre-trained Large Language Model (LLM) into a MoE-style model that possesses capabilities in multiple specialized domains. During instruction-tuning, SIMoE automatically identifies multiple specialized experts under a specified sparsity constraint, with each expert representing a structurally sparse subset of the seed LLM's parameters that correspond to domain-specific knowledge within the data. SIMoE simultaneously learns an input-dependent expert merging strategy via a router network, leveraging rich cross-expert knowledge for superior downstream generalization that surpasses existing baselines. Empirically, SIMoE consistently achieves state-of-the-art performance on common instruction-tuning benchmarks while maintaining an optimal performance-compute…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Automatic Expert Discovery in LLM Upcycling via Sparse Interpolated Mixture-of-Experts· underline

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Expert finding and Q&A systems · Data Quality and Management