SoftCorrect: Error Correction with Soft Detection for Automatic Speech   Recognition

Yichong Leng; Xu Tan; Wenjie Liu; Kaitao Song; Rui Wang; Xiang-Yang; Li; Tao Qin; Edward Lin; Tie-Yan Liu

arXiv:2212.01039·cs.CL·December 21, 2023

SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition

Yichong Leng, Xu Tan, Wenjie Liu, Kaitao Song, Rui Wang, Xiang-Yang, Li, Tao Qin, Edward Lin, Tie-Yan Liu

PDF

Open Access 1 Repo 1 Video

TL;DR

SoftCorrect introduces a novel soft error detection mechanism for automatic speech recognition error correction, improving accuracy by focusing correction efforts on likely incorrect words using a dedicated language model.

Contribution

The paper proposes SoftCorrect, a new error correction method that combines explicit error detection with a constrained CTC loss, outperforming previous approaches in accuracy and speed.

Findings

01

Achieves 26.1% CER reduction on AISHELL-1

02

Achieves 9.4% CER reduction on Aidatatang

03

Outperforms previous error correction methods

Abstract

Error correction in automatic speech recognition (ASR) aims to correct those incorrect words in sentences generated by ASR models. Since recent ASR models usually have low word error rate (WER), to avoid affecting originally correct tokens, error correction models should only modify incorrect words, and therefore detecting incorrect words is important for error correction. Previous works on error correction either implicitly detect error words through target-source attention or CTC (connectionist temporal classification) loss, or explicitly locate specific deletion/substitution/insertion errors. However, implicit error detection does not provide clear signal about which tokens are incorrect and explicit error detection suffers from low detection accuracy. In this paper, we propose SoftCorrect with a soft error detection mechanism to avoid the limitations of both explicit and implicit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/NeuralSpeech
pytorchOfficial

Videos

SoftCorrect: Error Correction with Soft Detection for Automatic Speech Recognition· underline

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Connectionist Temporal Classification Loss