Enhancing Code-Switching Speech Recognition with LID-Based Collaborative   Mixture of Experts Model

Hukai Huang; Jiayan Lin; Kaidi Wang; Yishuang Li; Wenhao Guan; Lin Li,; Qingyang Hong

arXiv:2409.02050·cs.CL·September 6, 2024

Enhancing Code-Switching Speech Recognition with LID-Based Collaborative Mixture of Experts Model

Hukai Huang, Jiayan Lin, Kaidi Wang, Yishuang Li, Wenhao Guan, Lin Li,, Qingyang Hong

PDF

Open Access

TL;DR

This paper introduces a collaborative Mixture of Experts model for code-switching speech recognition, utilizing language identification to improve expert routing and collaboration, leading to significant performance gains without extra pre-training.

Contribution

The study presents a novel LID-based collaborative MoE model that enhances routing and collaboration among language experts in code-switching speech recognition.

Findings

01

Achieved significant performance improvements over baseline methods.

02

Maintained efficient inference without additional pre-training.

03

Effectively integrated language-specific and attribute-based representations.

Abstract

Due to the inherent difficulty in modeling phonetic similarities across different languages, code-switching speech recognition presents a formidable challenge. This study proposes a Collaborative-MoE, a Mixture of Experts (MoE) model that leverages a collaborative mechanism among expert groups. Initially, a preceding routing network explicitly learns Language Identification (LID) tasks and selects experts based on acquired LID weights. This process ensures robust routing information to the MoE layer, mitigating interference from diverse language domains on expert network parameter updates. The LID weights are also employed to facilitate inter-group collaboration, enabling the integration of language-specific representations. Furthermore, within each language expert group, a gating network operates unsupervised to foster collaboration on attributes beyond language. Extensive experiments…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis

MethodsMixture of Experts