Learning Domain Specific Language Models for Automatic Speech Recognition through Machine Translation
Saurav Jha

TL;DR
This paper proposes a method to improve task-specific language models for speech recognition by using neural machine translation to generate translation-based training data and confusion networks, leading to lower perplexity.
Contribution
It introduces a novel approach of leveraging NMT-generated confusion networks to enhance language model training for cross-lingual speech recognition tasks.
Findings
NMT confusion networks reduce language model perplexity.
Training on confusion networks outperforms using only N-best translations.
Method improves ASR language modeling in cross-lingual scenarios.
Abstract
Automatic Speech Recognition (ASR) systems have been gaining popularity in the recent years for their widespread usage in smart phones and speakers. Building ASR systems for task-specific scenarios is subject to the availability of utterances that adhere to the style of the task as well as the language in question. In our work, we target such a scenario wherein task-specific text data is available in a language that is different from the target language in which an ASR Language Model (LM) is expected. We use Neural Machine Translation (NMT) as an intermediate step to first obtain translations of the task-specific text data. We then train LMs on the 1-best and N-best translations and study ways to improve on such a baseline LM. We develop a procedure to derive word confusion networks from NMT beam search graphs and evaluate LMs trained on these confusion networks. With experiments on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
