ASR Error Detection via Audio-Transcript entailment
Nimshi Venkat Meripo, Sandeep Konam

TL;DR
This paper introduces an innovative end-to-end audio-transcript entailment model for detecting ASR errors, especially in medical conversations, significantly improving error detection accuracy over existing methods.
Contribution
It is the first to frame ASR error detection as an end-to-end entailment task between audio and transcript segments, combining acoustic and linguistic encoders.
Findings
Achieved 26.2% CER on all errors, 23% on medical errors
Improved baseline performance by 12% and 15.4% respectively
Effective in medical domain error detection
Abstract
Despite improved performances of the latest Automatic Speech Recognition (ASR) systems, transcription errors are still unavoidable. These errors can have a considerable impact in critical domains such as healthcare, when used to help with clinical documentation. Therefore, detecting ASR errors is a critical first step in preventing further error propagation to downstream applications. To this end, we propose a novel end-to-end approach for ASR error detection using audio-transcript entailment. To the best of our knowledge, we are the first to frame this problem as an end-to-end entailment task between the audio segment and its corresponding transcript segment. Our intuition is that there should be a bidirectional entailment between audio and transcript when there is no recognition error and vice versa. The proposed model utilizes an acoustic encoder and a linguistic encoder to model the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Topic Modeling
