Adaptive Multi-Corpora Language Model Training for Speech Recognition

Yingyi Ma; Zhe Liu; Xuedong Zhang

arXiv:2211.05121·eess.AS·November 11, 2022

Adaptive Multi-Corpora Language Model Training for Speech Recognition

Yingyi Ma, Zhe Liu, Xuedong Zhang

PDF

Open Access

TL;DR

This paper introduces an adaptive training algorithm for neural network language models in speech recognition, dynamically adjusting data sampling from multiple corpora to improve adaptation performance.

Contribution

The paper proposes a novel adaptive sampling method that adjusts corpus selection during training, outperforming static strategies in speech recognition tasks.

Findings

01

Achieves up to 7% WER reduction in in-domain adaptation.

02

Achieves up to 9% WER reduction in out-of-domain adaptation.

03

Demonstrates robustness to corpus size and relevance.

Abstract

Neural network language model (NNLM) plays an essential role in automatic speech recognition (ASR) systems, especially in adaptation tasks when text-only data is available. In practice, an NNLM is typically trained on a combination of data sampled from multiple corpora. Thus, the data sampling strategy is important to the adaptation performance. Most existing works focus on designing static sampling strategies. However, each corpus may show varying impacts at different NNLM training stages. In this paper, we introduce a novel adaptive multi-corpora training algorithm that dynamically learns and adjusts the sampling probability of each corpus along the training process. The algorithm is robust to corpora sizes and domain relevance. Compared with static sampling strategy baselines, the proposed approach yields remarkable improvement by achieving up to relative 7% and 9% word error rate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing