TL;DR
This paper introduces a novel decoding algorithm for Mandarin E2E speech recognition that constructs word-level lattices on-the-fly, enabling effective integration of external word N-gram LMs and achieving state-of-the-art results.
Contribution
It proposes a new decoding method that constructs word-level lattices dynamically, allowing better use of external word-level language models in Mandarin ASR.
Findings
Outperforms subword-level LMs in experiments.
Achieves state-of-the-art CER on Aishell datasets.
Reduces CER by 14.8% on a large Mandarin dataset.
Abstract
Despite the rapid progress of end-to-end (E2E) automatic speech recognition (ASR), it has been shown that incorporating external language models (LMs) into the decoding can further improve the recognition performance of E2E ASR systems. To align with the modeling units adopted in E2E ASR systems, subword-level (e.g., characters, BPE) LMs are usually used to cooperate with current E2E ASR systems. However, the use of subword-level LMs will ignore the word-level information, which may limit the strength of the external LMs in E2E ASR. Although several methods have been proposed to incorporate word-level external LMs in E2E ASR, these methods are mainly designed for languages with clear word boundaries such as English and cannot be directly applied to languages like Mandarin, in which each character sequence can have multiple corresponding word sequences. To this end, we propose a novel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
