Internal Language Model Training for Domain-Adaptive End-to-End Speech   Recognition

Zhong Meng; Naoyuki Kanda; Yashesh Gaur; Sarangarajan Parthasarathy,; Eric Sun; Liang Lu; Xie Chen; Jinyu Li; Yifan Gong

arXiv:2102.01380·eess.AS·April 26, 2021·1 cites

Internal Language Model Training for Domain-Adaptive End-to-End Speech Recognition

Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy,, Eric Sun, Liang Lu, Xie Chen, Jinyu Li, Yifan Gong

PDF

Open Access

TL;DR

This paper introduces an internal LM training method that enhances domain adaptation in end-to-end speech recognition by enabling better external language model integration without sacrificing accuracy.

Contribution

The proposed ILMT method trains E2E models to form a standalone internal LM, improving external LM integration and domain adaptation in speech recognition systems.

Findings

01

Achieved up to 31.5% WER reduction on LibriSpeech

02

Achieved up to 11.4% WER reduction on Microsoft production data

03

Enhanced external LM integration without degrading ASR accuracy

Abstract

The efficacy of external language model (LM) integration with existing end-to-end (E2E) automatic speech recognition (ASR) systems can be improved significantly using the internal language model estimation (ILME) method. In this method, the internal LM score is subtracted from the score obtained by interpolating the E2E score with the external LM score, during inference. To improve the ILME-based inference, we propose an internal LM training (ILMT) method to minimize an additional internal LM loss by updating only the E2E model components that affect the internal LM estimation. ILMT encourages the E2E model to form a standalone LM inside its existing components, without sacrificing ASR accuracy. After ILMT, the more modular E2E model with matched training and inference criteria enables a more thorough elimination of the source-domain internal LM, and therefore leads to a more effective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling