Retraining-Free Merging of Sparse MoE via Hierarchical Clustering

I-Chun Chen; Hsu-Shen Liu; Wei-Fang Sun; Chen-Hao Chao; Yen-Chang Hsu; Chun-Yi Lee

arXiv:2410.08589·cs.LG·October 28, 2025

Retraining-Free Merging of Sparse MoE via Hierarchical Clustering

I-Chun Chen, Hsu-Shen Liu, Wei-Fang Sun, Chen-Hao Chao, Yen-Chang Hsu, Chun-Yi Lee

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper presents HC-SMoE, a novel hierarchical clustering method for merging experts in sparse Mixture-of-Experts models without retraining, reducing parameters while maintaining performance across language tasks.

Contribution

Introduces HC-SMoE, a task-agnostic expert merging framework based on output clustering that enables parameter reduction without retraining in SMoE models.

Findings

01

HC-SMoE effectively reduces model size with minimal performance loss.

02

The method outperforms existing expert merging techniques.

03

Validated on large-scale models like Qwen and Mixtral across multiple tasks.

Abstract

Sparse Mixture-of-Experts (SMoE) models represent a significant advancement in large language model (LLM) development through their efficient parameter utilization. These models achieve substantial performance improvements at reduced inference costs. However, the deployment of SMoE models faces constraints from extensive memory requirements of expert components in resource-limited environments. To address these limitations, this paper introduces Hierarchical Clustering for Sparsely activated Mixture of Experts (HC-SMoE), a task-agnostic expert merging framework for parameter reduction without retraining. HC-SMoE introduces a novel hierarchical clustering approach based on expert outputs to ensure merging robustness independent of routing decisions. The proposed output-based clustering method enables effective capture of functional relationships between experts for large-scale…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wazenmai/hc-smoe
pytorchOfficial

Videos

Retraining-free Merging of Sparse MoE via Hierarchical Clustering· slideslive

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Human Mobility and Location-Based Analysis · Privacy-Preserving Technologies in Data

MethodsSparse Evolutionary Training