Semantic-WER: A Unified Metric for the Evaluation of ASR Transcript for   End Usability

Somnath Roy

arXiv:2106.02016·cs.CL·October 19, 2021·6 cites

Semantic-WER: A Unified Metric for the Evaluation of ASR Transcript for End Usability

Somnath Roy

PDF

Open Access

TL;DR

Semantic-WER (SWER) is introduced as a new evaluation metric for ASR transcripts that incorporates semantic understanding, making it more suitable for downstream tasks like SLU and information retrieval.

Contribution

The paper proposes Semantic-WER (SWER), a novel metric that evaluates ASR transcripts based on semantic content, addressing limitations of traditional WER.

Findings

01

SWER can be customized for various downstream tasks.

02

SWER provides a more meaningful evaluation of ASR transcripts.

03

The metric improves correlation with task-specific performance.

Abstract

Recent advances in supervised, semi-supervised and self-supervised deep learning algorithms have shown significant improvement in the performance of automatic speech recognition(ASR) systems. The state-of-the-art systems have achieved a word error rate (WER) less than 5%. However, in the past, researchers have argued the non-suitability of the WER metric for the evaluation of ASR systems for downstream tasks such as spoken language understanding (SLU) and information retrieval. The reason is that the WER works at the surface level and does not include any syntactic and semantic knowledge.The current work proposes Semantic-WER (SWER), a metric to evaluate the ASR transcripts for downstream applications in general. The SWER can be easily customized for any down-stream task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems