Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models

Wentao Hu; Mingkuan Zhao; Shuangyong Song; Xiaoyan Zhu; Xin Lai; Jiayin Wang

arXiv:2511.19822·cs.LG·November 26, 2025

Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models

Wentao Hu, Mingkuan Zhao, Shuangyong Song, Xiaoyan Zhu, Xin Lai, Jiayin Wang

PDF

Open Access 1 Video

TL;DR

Mosaic Pruning is a hierarchical framework that constructs a diverse, representative set of experts in Mixture-of-Experts models, enhancing their generalization across multiple domains and tasks while reducing memory overhead.

Contribution

It introduces a structured clustering and selection process for pruning experts, improving domain generalization and performance in Mixture-of-Experts models.

Findings

01

Achieves 7.24% improvement on general tasks

02

Achieves 8.92% improvement on specialized tasks

03

Outperforms prior pruning methods significantly

Abstract

Sparse Mixture-of-Experts (SMoE) architectures have enabled a new frontier in scaling Large Language Models (LLMs), offering superior performance by activating only a fraction of their total parameters during inference. However, their practical deployment is severely hampered by substantial static memory overhead, as all experts must be loaded into memory. Existing post-training pruning methods, while reducing model size, often derive their pruning criteria from a single, general-purpose corpus. This leads to a critical limitation: a catastrophic performance degradation when the pruned model is applied to other domains, necessitating a costly re-pruning for each new domain. To address this generalization gap, we introduce Mosaic Pruning (MoP). The core idea of MoP is to construct a functionally comprehensive set of experts through a structured ``cluster-then-select" process. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Mosaic Pruning: A Hierarchical Framework for Generalizable Pruning of Mixture-of-Experts Models· underline

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Mobile Crowdsensing and Crowdsourcing · Topic Modeling