On the Relation between Internal Language Model and Sequence   Discriminative Training for Neural Transducers

Zijian Yang; Wei Zhou; Ralf Schl\"uter; Hermann Ney

arXiv:2309.14130·cs.SD·April 16, 2024

On the Relation between Internal Language Model and Sequence Discriminative Training for Neural Transducers

Zijian Yang, Wei Zhou, Ralf Schl\"uter, Hermann Ney

PDF

Open Access

TL;DR

This paper explores the theoretical and empirical relationship between internal language model subtraction and sequence discriminative training in neural transducers, revealing their strong correlation and similar effects on speech recognition performance.

Contribution

It demonstrates that sequence discriminative training and ILM subtraction are closely related, both theoretically and empirically, and shows that their effects overlap in neural transducer training.

Findings

01

Sequence discriminative training reduces the benefit of ILM subtraction.

02

Theoretical derivation links MMI training to ILM subtraction formulas.

03

Empirical results on Librispeech confirm the correlation across various training criteria.

Abstract

Internal language model (ILM) subtraction has been widely applied to improve the performance of the RNN-Transducer with external language model (LM) fusion for speech recognition. In this work, we show that sequence discriminative training has a strong correlation with ILM subtraction from both theoretical and empirical points of view. Theoretically, we derive that the global optimum of maximum mutual information (MMI) training shares a similar formula as ILM subtraction. Empirically, we show that ILM subtraction and sequence discriminative training achieve similar effects across a wide range of experiments on Librispeech, including both MMI and minimum Bayes risk (MBR) criteria, as well as neural transducers and LMs of both full and limited context. The benefit of ILM subtraction also becomes much smaller after sequence discriminative training. We also provide an in-depth study to show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Topic Modeling