CartesianMoE: Boosting Knowledge Sharing among Experts via Cartesian Product Routing in Mixture-of-Experts
Zhenpeng Su, Xing Wu, Zijia Lin, Yizhe Xiong, Minxuan Lv, Guangyuan, Ma, Hui Chen, Songlin Hu, Guiguang Ding

TL;DR
CartesianMoE introduces a novel routing mechanism for Mixture-of-Experts models that enhances knowledge sharing through a multiplicative approach, leading to improved performance and robustness in large language models.
Contribution
It proposes CartesianMoE, a new knowledge sharing method in MoE models inspired by collective matrix factorization, improving routing robustness and model performance.
Findings
Outperforms previous MoE models in perplexity and downstream tasks.
Achieves better expert routing robustness.
Demonstrates effectiveness in large language model training.
Abstract
Large language models (LLM) have been attracting much attention from the community recently, due to their remarkable performance in all kinds of downstream tasks. According to the well-known scaling law, scaling up a dense LLM enhances its capabilities, but also significantly increases the computational complexity. Mixture-of-Experts (MoE) models address that by allowing the model size to grow without substantially raising training or inference costs. Yet MoE models face challenges regarding knowledge sharing among experts, making their performance somehow sensitive to routing accuracy. To tackle that, previous works introduced shared experts and combined their outputs with those of the top routed experts in an ``addition'' manner. In this paper, inspired by collective matrix factorization to learn shared knowledge among data, we propose CartesianMoE, which implements more effective…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsExpert finding and Q&A systems · Semantic Web and Ontologies · Mobile Crowdsensing and Crowdsourcing
MethodsSoftmax · Attention Is All You Need · Mixture of Experts
