Dynamic Language Group-Based MoE: Enhancing Code-Switching Speech Recognition with Hierarchical Routing
Hukai Huang, Shenghui Lu, Yahui Shan, He Qu, Fengrun Zhang, Wenhao, Guan, Qingyang Hong, Lin Li

TL;DR
This paper introduces DLG-MoE, a hierarchical routing-based MoE model that significantly improves code-switching speech recognition by effectively leveraging parameter scaling and offering flexible inference and pruning capabilities.
Contribution
The paper proposes DLG-MoE, a novel hierarchical routing MoE model that enhances CS-ASR performance and flexibility over existing MoE approaches.
Findings
DLG-MoE outperforms existing MoE methods on CS-ASR tasks.
Supports different top-k inference and streaming capabilities.
Enables flexible model pruning to create monolingual sub-models.
Abstract
The Mixture of Experts (MoE) model is a promising approach for handling code-switching speech recognition (CS-ASR) tasks. However, the existing CS-ASR work on MoE has yet to leverage the advantages of MoE's parameter scaling ability fully. This work proposes DLG-MoE, a Dynamic Language Group-based MoE, which can effectively handle the CS-ASR task and leverage the advantages of parameter scaling. DLG-MoE operates based on a hierarchical routing mechanism. First, the language router explicitly models the language attribute and dispatches the representations to the corresponding language expert groups. Subsequently, the unsupervised router within each language group implicitly models attributes beyond language and coordinates expert routing and collaboration. DLG-MoE outperforms the existing MoE methods on CS-ASR tasks while demonstrating great flexibility. It supports different top-…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Robotics and Automated Systems
MethodsMixture of Experts
