Evaluating the Usability of Automatically Generated Captions for People who are Deaf or Hard of Hearing
Sushant Kafle, Matt Huenerfauth

TL;DR
This paper introduces a new evaluation metric for automatic speech recognition (ASR) captions that better predicts usability for deaf or hard of hearing users, outperforming traditional Word Error Rate (WER) in correlating with user preferences.
Contribution
The paper proposes a novel captioning-focused evaluation metric for ASR that aligns more closely with user preferences and usability for DHH individuals, validated through a user study.
Findings
New metric better predicts user preferences than WER
Higher correlation between new metric and subjective usability scores
Participants preferred captions rated higher by the new metric
Abstract
The accuracy of Automated Speech Recognition (ASR) technology has improved, but it is still imperfect in many settings. Researchers who evaluate ASR performance often focus on improving the Word Error Rate (WER) metric, but WER has been found to have little correlation with human-subject performance on many applications. We propose a new captioning-focused evaluation metric that better predicts the impact of ASR recognition errors on the usability of automatically generated captions for people who are Deaf or Hard of Hearing (DHH). Through a user study with 30 DHH users, we compared our new metric with the traditional WER metric on a caption usability evaluation task. In a side-by-side comparison of pairs of ASR text output (with identical WER), the texts preferred by our new metric were preferred by DHH participants. Further, our metric had significantly higher correlation with DHH…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
