Retraining-Free Merging of Sparse MoE via Hierarchical Clustering
I-Chun Chen, Hsu-Shen Liu, Wei-Fang Sun, Chen-Hao Chao, Yen-Chang Hsu, Chun-Yi Lee

TL;DR
This paper presents HC-SMoE, a novel hierarchical clustering method for merging experts in sparse Mixture-of-Experts models without retraining, reducing parameters while maintaining performance across language tasks.
Contribution
Introduces HC-SMoE, a task-agnostic expert merging framework based on output clustering that enables parameter reduction without retraining in SMoE models.
Findings
HC-SMoE effectively reduces model size with minimal performance loss.
The method outperforms existing expert merging techniques.
Validated on large-scale models like Qwen and Mixtral across multiple tasks.
Abstract
Sparse Mixture-of-Experts (SMoE) models represent a significant advancement in large language model (LLM) development through their efficient parameter utilization. These models achieve substantial performance improvements at reduced inference costs. However, the deployment of SMoE models faces constraints from extensive memory requirements of expert components in resource-limited environments. To address these limitations, this paper introduces Hierarchical Clustering for Sparsely activated Mixture of Experts (HC-SMoE), a task-agnostic expert merging framework for parameter reduction without retraining. HC-SMoE introduces a novel hierarchical clustering approach based on expert outputs to ensure merging robustness independent of routing decisions. The proposed output-based clustering method enables effective capture of functional relationships between experts for large-scale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Human Mobility and Location-Based Analysis · Privacy-Preserving Technologies in Data
MethodsSparse Evolutionary Training
