SpeechMoE2: Mixture-of-Experts Model with Improved Routing
Zhao You, Shulin Feng, Dan Su, Dong Yu

TL;DR
SpeechMoE2 introduces a new routing architecture for mixture-of-experts speech recognition models that incorporates global domain and accent embeddings, significantly improving accuracy across diverse domains and accents without increasing computational cost.
Contribution
The paper presents an enhanced router design for SpeechMoE that integrates global domain and accent embeddings, boosting speech recognition performance across varied conditions.
Findings
Achieves up to 4.8% relative CER reduction on multi-domain tasks.
Achieves up to 17.7% relative CER reduction on multi-accent tasks.
Maintains constant computational cost while improving accuracy.
Abstract
Mixture-of-experts based acoustic models with dynamic routing mechanisms have proved promising results for speech recognition. The design principle of router architecture is important for the large model capacity and high computational efficiency. Our previous work SpeechMoE only uses local grapheme embedding to help routers to make route decisions. To further improve speech recognition performance against varying domains and accents, we propose a new router architecture which integrates additional global domain and accent embedding into router input to promote adaptability. Experimental results show that the proposed SpeechMoE2 can achieve lower character error rate (CER) with comparable parameters than SpeechMoE on both multi-domain and multi-accent task. Primarily, the proposed method provides up to 1.6% - 4.8% relative CER improvement for the multidomain task and 1.9% - 17.7%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
