TL;DR
This paper introduces a novel method to incorporate lattice-free MMI discriminative training into end-to-end speech recognition systems, improving performance and training efficiency over previous MBR-based approaches.
Contribution
The work proposes integrating LF-MMI into E2E ASR systems during both training and decoding, ensuring consistency and eliminating the need for on-the-fly decoding.
Findings
Outperforms MBR-based methods in accuracy and efficiency.
Achieves state-of-the-art results on Aishell-1 and Aishell-2 datasets.
Demonstrates effectiveness across multiple E2E frameworks.
Abstract
In automatic speech recognition (ASR) research, discriminative criteria have achieved superior performance in DNN-HMM systems. Given this success, the adoption of discriminative criteria is promising to boost the performance of end-to-end (E2E) ASR systems. With this motivation, previous works have introduced the minimum Bayesian risk (MBR, one of the discriminative criteria) into E2E ASR systems. However, the effectiveness and efficiency of the MBR-based methods are compromised: the MBR criterion is only used in system training, which creates a mismatch between training and decoding; the on-the-fly decoding process in MBR-based methods results in the need for pre-trained models and slow training speeds. To this end, novel algorithms are proposed in this work to integrate another widely used discriminative criterion, lattice-free maximum mutual information (LF-MMI), into E2E ASR systems…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
