GW-MoE: Resolving Uncertainty in MoE Router with Global Workspace Theory
Haoze Wu, Zihan Qiu, Zili Wang, Hang Zhao, Jie Fu

TL;DR
This paper introduces GW-MoE, a fine-tuning method inspired by Global Workspace Theory, which reduces routing uncertainty in Mixture-of-Experts models, leading to improved performance across various tasks and model sizes without extra inference costs.
Contribution
The paper proposes GW-MoE, a novel fine-tuning approach that broadcasts uncertain tokens across experts to mitigate routing ambiguity in MoE models, inspired by cognitive theory.
Findings
GW-MoE reduces routing uncertainty in MoE models.
Improves performance across multiple NLP tasks.
Effective for models with 650M and 8B parameters.
Abstract
Mixture-of-Experts (MoE) has been demonstrated as an efficient method to scale up models. By dynamically and sparsely selecting activated experts, MoE can effectively reduce computational costs. Despite the success, we observe that many tokens in the MoE models have uncertain routing results. These tokens have nearly equal scores for choosing each expert, and we demonstrate that this uncertainty can lead to incorrect selections. Inspired by the Global Workspace Theory (GWT), we propose a new fine-tuning method, GW-MoE, to address this issue. The core idea is to broadcast the uncertain tokens across experts during fine-tuning. Therefore, these tokens can acquire the necessary knowledge from any expert during inference and become less sensitive to the choice. GW-MoE does not introduce additional inference overhead. We validate that GW can mitigate the uncertain problem and consistently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsManufacturing Process and Optimization · Simulation Techniques and Applications · Model-Driven Software Engineering Techniques
MethodsMixture of Experts
