Dynamic Language Group-Based MoE: Enhancing Code-Switching Speech   Recognition with Hierarchical Routing

Hukai Huang; Shenghui Lu; Yahui Shan; He Qu; Fengrun Zhang; Wenhao; Guan; Qingyang Hong; Lin Li

arXiv:2407.18581·cs.CL·December 24, 2024

Dynamic Language Group-Based MoE: Enhancing Code-Switching Speech Recognition with Hierarchical Routing

Hukai Huang, Shenghui Lu, Yahui Shan, He Qu, Fengrun Zhang, Wenhao, Guan, Qingyang Hong, Lin Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces DLG-MoE, a hierarchical routing-based MoE model that significantly improves code-switching speech recognition by effectively leveraging parameter scaling and offering flexible inference and pruning capabilities.

Contribution

The paper proposes DLG-MoE, a novel hierarchical routing MoE model that enhances CS-ASR performance and flexibility over existing MoE approaches.

Findings

01

DLG-MoE outperforms existing MoE methods on CS-ASR tasks.

02

Supports different top-k inference and streaming capabilities.

03

Enables flexible model pruning to create monolingual sub-models.

Abstract

The Mixture of Experts (MoE) model is a promising approach for handling code-switching speech recognition (CS-ASR) tasks. However, the existing CS-ASR work on MoE has yet to leverage the advantages of MoE's parameter scaling ability fully. This work proposes DLG-MoE, a Dynamic Language Group-based MoE, which can effectively handle the CS-ASR task and leverage the advantages of parameter scaling. DLG-MoE operates based on a hierarchical routing mechanism. First, the language router explicitly models the language attribute and dispatches the representations to the corresponding language expert groups. Subsequently, the unsupervised router within each language group implicitly models attributes beyond language and coordinates expert routing and collaboration. DLG-MoE outperforms the existing MoE methods on CS-ASR tasks while demonstrating great flexibility. It supports different top- $k$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kaihuhuang/language-group
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Robotics and Automated Systems

MethodsMixture of Experts