Unchosen Experts Can Contribute Too: Unleashing MoE Models' Power by Self-Contrast
Chufan Shi, Cheng Yang, Xinyu Zhu, Jiahao Wang, Taiqiang Wu, Siheng, Li, Deng Cai, Yujiu Yang, Yu Meng

TL;DR
This paper introduces SCMoE, a simple and efficient method that leverages unchosen experts in MoE models through self-contrast during inference, significantly improving reasoning accuracy across multiple benchmarks.
Contribution
The paper proposes SCMoE, a training-free self-contrast strategy that utilizes unchosen experts in MoE models to enhance performance without additional training.
Findings
SCMoE improves GSM8K accuracy from 61.79 to 66.94.
SCMoE increases major@20 accuracy from 75.59 to 78.31.
Using unchosen experts via self-contrast enhances MoE model reasoning capabilities.
Abstract
Mixture-of-Experts (MoE) has emerged as a prominent architecture for scaling model size while maintaining computational efficiency. In MoE, each token in the input sequence activates a different subset of experts determined by a routing mechanism. However, the unchosen experts in MoE models do not contribute to the output, potentially leading to underutilization of the model's capacity. In this work, we first conduct exploratory studies to demonstrate that increasing the number of activated experts does not necessarily improve and can even degrade the output quality. Then, we show that output distributions from an MoE model using different routing strategies substantially differ, indicating that different experts do not always act synergistically. Motivated by these findings, we propose Self-Contrast Mixture-of-Experts (SCMoE), a training-free strategy that utilizes unchosen experts in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBig Data and Business Intelligence · Business Process Modeling and Analysis · Semantic Web and Ontologies
MethodsMixture of Experts
