Error Correction by Paying Attention to Both Acoustic and Confidence   References for Automatic Speech Recognition

Yuchun Shu; Bo Hu; Yifeng He; Hao Shi; Longbiao Wang; Jianwu Dang

arXiv:2407.12817·cs.CL·July 19, 2024

Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition

Yuchun Shu, Bo Hu, Yifeng He, Hao Shi, Longbiao Wang, Jianwu Dang

PDF

Open Access

TL;DR

This paper introduces a non-autoregressive speech error correction method that leverages both acoustic features and confidence scores to identify and correct errors in automatic speech recognition hypotheses, significantly reducing error rates.

Contribution

The novel approach combines confidence estimation and acoustic features with a cross-attention mechanism for improved speech error correction.

Findings

01

Reduces error rate by 21% compared to baseline ASR.

02

Utilizes confidence scores to identify error-prone words.

03

Incorporates acoustic features to enhance correction accuracy.

Abstract

Accurately finding the wrong words in the automatic speech recognition (ASR) hypothesis and recovering them well-founded is the goal of speech error correction. In this paper, we propose a non-autoregressive speech error correction method. A Confidence Module measures the uncertainty of each word of the N-best ASR hypotheses as the reference to find the wrong word position. Besides, the acoustic feature from the ASR encoder is also used to provide the correct pronunciation references. N-best candidates from ASR are aligned using the edit path, to confirm each other and recover some missing character errors. Furthermore, the cross-attention mechanism fuses the information between error correction references and the ASR hypothesis. The experimental results show that both the acoustic and confidence references help with error correction. The proposed system reduces the error rate by 21%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing