Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding
Ruchao Fan, Guoli Ye, Yashesh Gaur, and Jinyu Li

TL;DR
This paper introduces a mask sample decoding method to improve non-autoregressive spell correction in speech recognition, achieving significant error rate reductions on benchmark datasets.
Contribution
Proposes a novel mask sample decoding technique to enhance MLM-based non-autoregressive spell correction for speech recognition.
Findings
WER reduced from 7.6% to 6.5% on Librispeech
CER reduced from 7.3% to 6.1% on Aishell
Demonstrates effectiveness of MS-decode in speech correction
Abstract
Masked language model (MLM) has been widely used for understanding tasks, e.g. BERT. Recently, MLM has also been used for generation tasks. The most popular one in speech is using Mask-CTC for non-autoregressive speech recognition. In this paper, we take one step further, and explore the possibility of using MLM as a non-autoregressive spell correction (SC) model for transformer-transducer (TT), denoted as MLM-SC. Our initial experiments show that MLM-SC provides no improvements on Librispeech data. The problem might be the choice of modeling units (word pieces) and the inaccuracy of the TT confidence scores for English data. To solve the problem, we propose a mask sample decoding (MS-decode) method where the masked tokens can have the choice of being masked or not to compensate for the inaccuracy. As a result, we reduce the WER of a streaming TT from 7.6% to 6.5% on the Librispeech…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research
MethodsMulti-Head Attention · Attention Is All You Need · Test · Softmax · Adam · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Layer Normalization · Residual Connection
