Internal Language Model Estimation based Language Model Fusion for   Cross-Domain Code-Switching Speech Recognition

Yizhou Peng; Yufei Liu; Jicheng Zhang; Haihua Xu; Yi He; Hao Huang and; Eng Siong Chng

arXiv:2207.04176·eess.AS·July 12, 2022·5 cites

Internal Language Model Estimation based Language Model Fusion for Cross-Domain Code-Switching Speech Recognition

Yizhou Peng, Yufei Liu, Jicheng Zhang, Haihua Xu, Yi He, Hao Huang and, Eng Siong Chng

PDF

Open Access

TL;DR

This paper explores the application of Internal Language Model Estimation (ILME) based fusion to improve cross-domain code-switching speech recognition, demonstrating its effectiveness across different datasets and domain combinations.

Contribution

It extends ILME-based language model fusion to cross-domain code-switching speech recognition and evaluates its effectiveness with monolingual data merging.

Findings

01

ILME fusion improves recognition accuracy in cross-domain CSSR

02

Effective for intra-domain and cross-domain tasks

03

Works well with merged monolingual datasets

Abstract

Internal Language Model Estimation (ILME) based language model (LM) fusion has been shown significantly improved recognition results over conventional shallow fusion in both intra-domain and cross-domain speech recognition tasks. In this paper, we attempt to apply our ILME method to cross-domain code-switching speech recognition (CSSR) work. Specifically, our curiosity comes from several aspects. First, we are curious about how effective the ILME-based LM fusion is for both intra-domain and cross-domain CSSR tasks. We verify this with or without merging two code-switching domains. More importantly, we train an end-to-end (E2E) speech recognition model by means of merging two monolingual data sets and observe the efficacy of the proposed ILME-based LM fusion for CSSR. Experimental results on SEAME that is from Southeast Asian and another Chinese Mainland CS data set demonstrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research