On the Role of Discrete Representation in Sparse Mixture of Experts
Giang Do, Kha Pham, Hung Le, Truyen Tran

TL;DR
This paper introduces VQMoE, a novel sparse mixture of experts architecture that uses learned discrete input representations via vector quantization to improve routing robustness and performance.
Contribution
It proposes a new expert routing method using discrete representations, addressing issues of inconsistency and collapse in traditional SMoE routers.
Findings
VQMoE achieves 28% better robustness than existing methods.
VQMoE maintains strong performance in fine-tuning tasks.
Theoretical and empirical evidence support VQMoE's effectiveness.
Abstract
Sparse mixture of experts (SMoE) is an effective solution for scaling up model capacity without increasing the computational costs. A crucial component of SMoE is the router, responsible for directing the input to relevant experts; however, it also presents a major weakness, leading to routing inconsistencies and representation collapse issues. Instead of fixing the router like previous works, we propose an alternative that assigns experts to input via indirection, which employs the discrete representation of input that points to the expert. The discrete representations are learnt via vector quantization, resulting in a new architecture dubbed Vector-Quantized Mixture of Experts (VQMoE). We provide theoretical support and empirical evidence demonstrating the VQMoE's ability to overcome the challenges present in traditional routers. Through extensive evaluations on both large language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models
