MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification

Weisen Jiang; Shuhao Chen; Sinno Jialin Pan

arXiv:2605.14289·cs.LG·May 15, 2026

MetaMoE: Diversity-Aware Proxy Selection for Privacy-Preserving Mixture-of-Experts Unification

Weisen Jiang, Shuhao Chen, Sinno Jialin Pan

PDF

1 Repo

TL;DR

MetaMoE introduces a privacy-preserving method for unifying distributed, domain-specific experts into a single Mixture-of-Experts model using diverse proxy data, enhancing coordination and performance.

Contribution

It proposes a novel diversity-aware proxy selection technique for privacy-preserving MoE unification with improved expert coordination.

Findings

01

MetaMoE outperforms recent privacy-preserving MoE unification methods.

02

The approach effectively aligns expert training using public proxy data.

03

Experiments demonstrate consistent improvements on vision and NLP benchmarks.

Abstract

Mixture-of-Experts (MoE) models scale capacity by combining specialized experts, but most existing approaches assume centralized access to training data. In practice, data are distributed across clients and cannot be shared due to privacy constraints, making unified MoE training challenging. We propose MetaMoE, a privacy-preserving framework that unifies independently trained, domain-specialized experts into a single MoE using public proxy data as surrogates for inaccessible private data. Central to MetaMoE is diversity-aware proxy selection, which selects client-domain-relevant and diverse samples from public data to effectively approximate private data distributions and supervise router learning. These proxies are further used to align expert training, improving expert coordination at unification time, while a context-aware router enhances expert selection across heterogeneous inputs.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ws-jiang/MetaMoE
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.