FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition
Yichong Leng, Xu Tan, Rui Wang, Linchen Zhu, Jin Xu, Wenjie Liu,, Linquan Liu, Tao Qin, Xiang-Yang Li, Edward Lin, Tie-Yan Liu

TL;DR
FastCorrect 2 introduces a multi-candidate error correction model for ASR that leverages multiple hypotheses to improve word error rate reduction, utilizing novel alignment and candidate selection techniques for faster and more accurate correction.
Contribution
It presents a non-autoregressive, multi-candidate correction model with novel alignment and candidate prediction algorithms, outperforming previous single-candidate correction methods.
Findings
Reduces WER by up to 3.2% over previous models
Leverages multiple ASR candidates for improved correction accuracy
Faster inference due to non-autoregressive generation
Abstract
Error correction is widely used in automatic speech recognition (ASR) to post-process the generated sentence, and can further reduce the word error rate (WER). Although multiple candidates are generated by an ASR system through beam search, current error correction approaches can only correct one sentence at a time, failing to leverage the voting effect from multiple candidates to better detect and correct error tokens. In this work, we propose FastCorrect 2, an error correction model that takes multiple ASR candidates as input for better correction accuracy. FastCorrect 2 adopts non-autoregressive generation for fast inference, which consists of an encoder that processes multiple source sentences and a decoder that generates the target sentence in parallel from the adjusted source sentence, where the adjustment is based on the predicted duration of each source token. However, there are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing
