Semantic-WER: A Unified Metric for the Evaluation of ASR Transcript for End Usability
Somnath Roy

TL;DR
Semantic-WER (SWER) is introduced as a new evaluation metric for ASR transcripts that incorporates semantic understanding, making it more suitable for downstream tasks like SLU and information retrieval.
Contribution
The paper proposes Semantic-WER (SWER), a novel metric that evaluates ASR transcripts based on semantic content, addressing limitations of traditional WER.
Findings
SWER can be customized for various downstream tasks.
SWER provides a more meaningful evaluation of ASR transcripts.
The metric improves correlation with task-specific performance.
Abstract
Recent advances in supervised, semi-supervised and self-supervised deep learning algorithms have shown significant improvement in the performance of automatic speech recognition(ASR) systems. The state-of-the-art systems have achieved a word error rate (WER) less than 5%. However, in the past, researchers have argued the non-suitability of the WER metric for the evaluation of ASR systems for downstream tasks such as spoken language understanding (SLU) and information retrieval. The reason is that the WER works at the surface level and does not include any syntactic and semantic knowledge.The current work proposes Semantic-WER (SWER), a metric to evaluate the ASR transcripts for downstream applications in general. The SWER can be easily customized for any down-stream task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
