TL;DR
This paper introduces a non-destructive, token-based method using extended Levenshtein distance for more robust Word Error Rate (WER) calculations and detailed error classification in speech recognition transcripts, preserving information on punctuation and capitalization.
Contribution
It presents a novel, non-destructive approach to compute WER and classify errors more granularly, improving upon traditional normalization-based methods.
Findings
The approach is practically equivalent to standard WER computations.
It enables detailed analysis of punctuation and orthographic errors.
The method is demonstrated through a web application and open-source code.
Abstract
The Word Error Rate (WER) is the common measure of accuracy for Automatic Speech Recognition (ASR). Transcripts are usually pre-processed by substituting specific characters to account for non-semantic differences. As a result of this normalisation, information on the accuracy of punctuation or capitalisation is lost. We present a non-destructive, token-based approach using an extended Levenshtein distance algorithm to compute a robust WER and additional orthographic metrics. Transcription errors are also classified more granularly by existing string similarity and phonetic algorithms. An evaluation on several datasets demonstrates the practical equivalence of our approach compared to common WER computations. We also provide an exemplary analysis of derived use cases, such as a punctuation error rate, and a web application for interactive use and visualisation of our implementation. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
