FastCorrect 2: Fast Error Correction on Multiple Candidates for   Automatic Speech Recognition

Yichong Leng; Xu Tan; Rui Wang; Linchen Zhu; Jin Xu; Wenjie Liu,; Linquan Liu; Tao Qin; Xiang-Yang Li; Edward Lin; Tie-Yan Liu

arXiv:2109.14420·cs.CL·November 30, 2022

FastCorrect 2: Fast Error Correction on Multiple Candidates for Automatic Speech Recognition

Yichong Leng, Xu Tan, Rui Wang, Linchen Zhu, Jin Xu, Wenjie Liu,, Linquan Liu, Tao Qin, Xiang-Yang Li, Edward Lin, Tie-Yan Liu

PDF

Open Access 1 Repo

TL;DR

FastCorrect 2 introduces a multi-candidate error correction model for ASR that leverages multiple hypotheses to improve word error rate reduction, utilizing novel alignment and candidate selection techniques for faster and more accurate correction.

Contribution

It presents a non-autoregressive, multi-candidate correction model with novel alignment and candidate prediction algorithms, outperforming previous single-candidate correction methods.

Findings

01

Reduces WER by up to 3.2% over previous models

02

Leverages multiple ASR candidates for improved correction accuracy

03

Faster inference due to non-autoregressive generation

Abstract

Error correction is widely used in automatic speech recognition (ASR) to post-process the generated sentence, and can further reduce the word error rate (WER). Although multiple candidates are generated by an ASR system through beam search, current error correction approaches can only correct one sentence at a time, failing to leverage the voting effect from multiple candidates to better detect and correct error tokens. In this work, we propose FastCorrect 2, an error correction model that takes multiple ASR candidates as input for better correction accuracy. FastCorrect 2 adopts non-autoregressive generation for fast inference, which consists of an encoder that processes multiple source sentences and a decoder that generates the target sentence in parallel from the adjusted source sentence, where the adjustment is based on the predicted duration of each source token. However, there are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/NeuralSpeech
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing