TL;DR
MetaMoE introduces a privacy-preserving method for unifying distributed, domain-specific experts into a single Mixture-of-Experts model using diverse proxy data, enhancing coordination and performance.
Contribution
It proposes a novel diversity-aware proxy selection technique for privacy-preserving MoE unification with improved expert coordination.
Findings
MetaMoE outperforms recent privacy-preserving MoE unification methods.
The approach effectively aligns expert training using public proxy data.
Experiments demonstrate consistent improvements on vision and NLP benchmarks.
Abstract
Mixture-of-Experts (MoE) models scale capacity by combining specialized experts, but most existing approaches assume centralized access to training data. In practice, data are distributed across clients and cannot be shared due to privacy constraints, making unified MoE training challenging. We propose MetaMoE, a privacy-preserving framework that unifies independently trained, domain-specialized experts into a single MoE using public proxy data as surrogates for inaccessible private data. Central to MetaMoE is diversity-aware proxy selection, which selects client-domain-relevant and diverse samples from public data to effectively approximate private data distributions and supervise router learning. These proxies are further used to align expert training, improving expert coordination at unification time, while a context-aware router enhances expert selection across heterogeneous inputs.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
