Lattice-Free Sequence Discriminative Training for Phoneme-Based Neural Transducers
Zijian Yang, Wei Zhou, Ralf Schl\"uter, Hermann Ney

TL;DR
This paper introduces three lattice-free sequence discriminative training methods for phoneme-based neural transducers, improving training efficiency and reducing word error rates in speech recognition tasks.
Contribution
It proposes novel lattice-free training objectives for neural transducers, eliminating the decoding step during training and significantly speeding up the process.
Findings
Up to 6.5% relative WER improvement over cross-entropy training.
40%-70% reduction in training time compared to N-best-list based methods.
Lattice-free methods maintain comparable performance with small degradation.
Abstract
Recently, RNN-Transducers have achieved remarkable results on various automatic speech recognition tasks. However, lattice-free sequence discriminative training methods, which obtain superior performance in hybrid models, are rarely investigated in RNN-Transducers. In this work, we propose three lattice-free training objectives, namely lattice-free maximum mutual information, lattice-free segment-level minimum Bayes risk, and lattice-free minimum Bayes risk, which are used for the final posterior output of the phoneme-based neural transducer with a limited context dependency. Compared to criteria using N-best lists, lattice-free methods eliminate the decoding step for hypotheses generation during training, which leads to more efficient training. Experimental results show that lattice-free methods gain up to 6.5% relative improvement in word error rate compared to a sequence-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
