Evaluating the Usability of Automatically Generated Captions for People   who are Deaf or Hard of Hearing

Sushant Kafle; Matt Huenerfauth

arXiv:1712.02033·cs.HC·December 29, 2017

Evaluating the Usability of Automatically Generated Captions for People who are Deaf or Hard of Hearing

Sushant Kafle, Matt Huenerfauth

PDF

TL;DR

This paper introduces a new evaluation metric for automatic speech recognition (ASR) captions that better predicts usability for deaf or hard of hearing users, outperforming traditional Word Error Rate (WER) in correlating with user preferences.

Contribution

The paper proposes a novel captioning-focused evaluation metric for ASR that aligns more closely with user preferences and usability for DHH individuals, validated through a user study.

Findings

01

New metric better predicts user preferences than WER

02

Higher correlation between new metric and subjective usability scores

03

Participants preferred captions rated higher by the new metric

Abstract

The accuracy of Automated Speech Recognition (ASR) technology has improved, but it is still imperfect in many settings. Researchers who evaluate ASR performance often focus on improving the Word Error Rate (WER) metric, but WER has been found to have little correlation with human-subject performance on many applications. We propose a new captioning-focused evaluation metric that better predicts the impact of ASR recognition errors on the usability of automatically generated captions for people who are Deaf or Hard of Hearing (DHH). Through a user study with 30 DHH users, we compared our new metric with the traditional WER metric on a caption usability evaluation task. In a side-by-side comparison of pairs of ASR text output (with identical WER), the texts preferred by our new metric were preferred by DHH participants. Further, our metric had significantly higher correlation with DHH…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.