Evaluating User Perception of Speech Recognition System Quality with   Semantic Distance Metric

Suyoun Kim; Duc Le; Weiyi Zheng; Tarun Singh; Abhinav Arora; Xiaoyu; Zhai; Christian Fuegen; Ozlem Kalinli; Michael L. Seltzer

arXiv:2110.05376·cs.CL·July 7, 2022·5 cites

Evaluating User Perception of Speech Recognition System Quality with Semantic Distance Metric

Suyoun Kim, Duc Le, Weiyi Zheng, Tarun Singh, Abhinav Arora, Xiaoyu, Zhai, Christian Fuegen, Ozlem Kalinli, Michael L. Seltzer

PDF

Open Access

TL;DR

This paper introduces SemDist, a semantic distance metric for evaluating speech recognition quality, which correlates better with user perception and downstream NLU tasks than traditional WER.

Contribution

The paper proposes SemDist, a novel semantic correctness metric for ASR evaluation that outperforms WER in correlating with user perception and NLU performance.

Findings

01

SemDist correlates more strongly with user perception than WER.

02

SemDist shows higher correlation with downstream NLU tasks.

03

Experimental results based on large user-annotated datasets support these claims.

Abstract

Measuring automatic speech recognition (ASR) system quality is critical for creating user-satisfying voice-driven applications. Word Error Rate (WER) has been traditionally used to evaluate ASR system quality; however, it sometimes correlates poorly with user perception/judgement of transcription quality. This is because WER weighs every word equally and does not consider semantic correctness which has a higher impact on user perception. In this work, we propose evaluating ASR output hypotheses quality with SemDist that can measure semantic correctness by using the distance between the semantic vectors of the reference and hypothesis extracted from a pre-trained language model. Our experimental results of 71K and 36K user annotated ASR output quality show that SemDist achieves higher correlation with user perception than WER. We also show that SemDist has higher correlation with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems