Language-Routing Mixture of Experts for Multilingual and Code-Switching   Speech Recognition

Wenxuan Wang; Guodong Ma; Yuke Li; Binbin Du

arXiv:2307.05956·cs.SD·July 17, 2023

Language-Routing Mixture of Experts for Multilingual and Code-Switching Speech Recognition

Wenxuan Wang, Guodong Ma, Yuke Li, Binbin Du

PDF

Open Access

TL;DR

This paper introduces LR-MoE, a computation-efficient multilingual and code-switching speech recognition model that uses language-specific experts and a frame-wise routing mechanism to improve accuracy while maintaining efficiency.

Contribution

The paper proposes LR-MoE, a novel language-routing MoE model that enhances multilingual and code-switching speech recognition with reduced computational complexity.

Findings

01

Significant performance improvements over baseline models.

02

Maintains computational efficiency comparable to existing methods.

03

Effective language-specific representation learning through MoE.

Abstract

Multilingual speech recognition for both monolingual and code-switching speech is a challenging task. Recently, based on the Mixture of Experts (MoE), many works have made good progress in multilingual and code-switching ASR, but present huge computational complexity with the increase of supported languages. In this work, we propose a computation-efficient network named Language-Routing Mixture of Experts (LR-MoE) for multilingual and code-switching ASR. LR-MoE extracts language-specific representations through the Mixture of Language Experts (MLE), which is guided to learn by a frame-wise language routing mechanism. The weight-shared frame-level language identification (LID) network is jointly trained as the shared pre-router of each MoE layer. Experiments show that the proposed method significantly improves multilingual and code-switching speech recognition performances over baseline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research