SpeechMoE2: Mixture-of-Experts Model with Improved Routing

Zhao You; Shulin Feng; Dan Su; Dong Yu

arXiv:2111.11831·eess.AS·November 24, 2021

SpeechMoE2: Mixture-of-Experts Model with Improved Routing

Zhao You, Shulin Feng, Dan Su, Dong Yu

PDF

TL;DR

SpeechMoE2 introduces a new routing architecture for mixture-of-experts speech recognition models that incorporates global domain and accent embeddings, significantly improving accuracy across diverse domains and accents without increasing computational cost.

Contribution

The paper presents an enhanced router design for SpeechMoE that integrates global domain and accent embeddings, boosting speech recognition performance across varied conditions.

Findings

01

Achieves up to 4.8% relative CER reduction on multi-domain tasks.

02

Achieves up to 17.7% relative CER reduction on multi-accent tasks.

03

Maintains constant computational cost while improving accuracy.

Abstract

Mixture-of-experts based acoustic models with dynamic routing mechanisms have proved promising results for speech recognition. The design principle of router architecture is important for the large model capacity and high computational efficiency. Our previous work SpeechMoE only uses local grapheme embedding to help routers to make route decisions. To further improve speech recognition performance against varying domains and accents, we propose a new router architecture which integrates additional global domain and accent embedding into router input to promote adaptability. Experimental results show that the proposed SpeechMoE2 can achieve lower character error rate (CER) with comparable parameters than SpeechMoE on both multi-domain and multi-accent task. Primarily, the proposed method provides up to 1.6% - 4.8% relative CER improvement for the multidomain task and 1.9% - 17.7%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.