Investigating the Effect of Language Models in Sequence Discriminative Training for Neural Transducers
Zijian Yang, Wei Zhou, Ralf Schl\"uter, Hermann Ney

TL;DR
This paper explores how different language model configurations impact sequence discriminative training for neural transducers, proposing methods to incorporate full-context and word-level LMs, with experiments on Librispeech.
Contribution
It introduces a novel approximation method for full-context dependency in lattice-free training and systematically compares lattice-free and N-best-list approaches.
Findings
Word-level LMs outperform phoneme-level LMs in training.
LM context size has limited impact on performance.
Hypothesis space quality is crucial for training success.
Abstract
In this work, we investigate the effect of language models (LMs) with different context lengths and label units (phoneme vs. word) used in sequence discriminative training for phoneme-based neural transducers. Both lattice-free and N-best-list approaches are examined. For lattice-free methods with phoneme-level LMs, we propose a method to approximate the context history to employ LMs with full-context dependency. This approximation can be extended to arbitrary context length and enables the usage of word-level LMs in lattice-free methods. Moreover, a systematic comparison is conducted across lattice-free and N-best-list-based methods. Experimental results on Librispeech show that using the word-level LM in training outperforms the phoneme-level LM. Besides, we find that the context size of the LM used for probability computation has a limited effect on performance. Moreover, our results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Domain Adaptation and Few-Shot Learning
