On the Role of Discrete Representation in Sparse Mixture of Experts

Giang Do; Kha Pham; Hung Le; Truyen Tran

arXiv:2411.19402·cs.LG·July 29, 2025

On the Role of Discrete Representation in Sparse Mixture of Experts

Giang Do, Kha Pham, Hung Le, Truyen Tran

PDF

Open Access

TL;DR

This paper introduces VQMoE, a novel sparse mixture of experts architecture that uses learned discrete input representations via vector quantization to improve routing robustness and performance.

Contribution

It proposes a new expert routing method using discrete representations, addressing issues of inconsistency and collapse in traditional SMoE routers.

Findings

01

VQMoE achieves 28% better robustness than existing methods.

02

VQMoE maintains strong performance in fine-tuning tasks.

03

Theoretical and empirical evidence support VQMoE's effectiveness.

Abstract

Sparse mixture of experts (SMoE) is an effective solution for scaling up model capacity without increasing the computational costs. A crucial component of SMoE is the router, responsible for directing the input to relevant experts; however, it also presents a major weakness, leading to routing inconsistencies and representation collapse issues. Instead of fixing the router like previous works, we propose an alternative that assigns experts to input via indirection, which employs the discrete representation of input that points to the expert. The discrete representations are learnt via vector quantization, resulting in a new architecture dubbed Vector-Quantized Mixture of Experts (VQMoE). We provide theoretical support and empirical evidence demonstrating the VQMoE's ability to overcome the challenges present in traditional routers. Through extensive evaluations on both large language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models