Investigating the Effect of Language Models in Sequence Discriminative   Training for Neural Transducers

Zijian Yang; Wei Zhou; Ralf Schl\"uter; Hermann Ney

arXiv:2310.07345·cs.CL·October 12, 2023

Investigating the Effect of Language Models in Sequence Discriminative Training for Neural Transducers

Zijian Yang, Wei Zhou, Ralf Schl\"uter, Hermann Ney

PDF

Open Access

TL;DR

This paper explores how different language model configurations impact sequence discriminative training for neural transducers, proposing methods to incorporate full-context and word-level LMs, with experiments on Librispeech.

Contribution

It introduces a novel approximation method for full-context dependency in lattice-free training and systematically compares lattice-free and N-best-list approaches.

Findings

01

Word-level LMs outperform phoneme-level LMs in training.

02

LM context size has limited impact on performance.

03

Hypothesis space quality is crucial for training success.

Abstract

In this work, we investigate the effect of language models (LMs) with different context lengths and label units (phoneme vs. word) used in sequence discriminative training for phoneme-based neural transducers. Both lattice-free and N-best-list approaches are examined. For lattice-free methods with phoneme-level LMs, we propose a method to approximate the context history to employ LMs with full-context dependency. This approximation can be extended to arbitrary context length and enables the usage of word-level LMs in lattice-free methods. Moreover, a systematic comparison is conducted across lattice-free and N-best-list-based methods. Experimental results on Librispeech show that using the word-level LM in training outperforms the phoneme-level LM. Besides, we find that the context size of the LM used for probability computation has a limited effect on performance. Moreover, our results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Topic Modeling · Domain Adaptation and Few-Shot Learning