Automatic Speech Recognition Services: Deaf and Hard-of-Hearing   Usability

Abraham Glasser

arXiv:1909.02853·cs.HC·September 10, 2019

Automatic Speech Recognition Services: Deaf and Hard-of-Hearing Usability

Abraham Glasser

PDF

Open Access

TL;DR

This paper evaluates the performance of current Automatic Speech Recognition systems with voices of Deaf and Hard-of-Hearing speakers, highlighting the challenges and improvements in accessibility for DHH users.

Contribution

It assesses how well existing ASR systems perform with DHH speech and explores the impact of custom vocabulary models on recognition accuracy.

Findings

01

ASR systems achieve 5-6% Word Error Rate with standard speech.

02

Custom vocabulary models improve recognition accuracy for DHH speech.

03

Current ASR systems still face challenges in accurately transcribing DHH speech.

Abstract

Nowadays, speech is becoming a more common, if not standard, interface to technology. This can be seen in the trend of technology changes over the years. Increasingly, voice is used to control programs, appliances and personal devices within homes, cars, workplaces, and public spaces through smartphones and home assistant devices using Amazon's Alexa, Google's Assistant and Apple's Siri, and other proliferating technologies. However, most speech interfaces are not accessible for Deaf and Hard-of-Hearing (DHH) people. In this paper, performances of current Automatic Speech Recognition (ASR) with voices of DHH speakers are evaluated. ASR has improved over the years, and is able to reach Word Error Rates (WER) as low as 5-6% [1][2][3], with the help of cloud-computing and machine learning algorithms that take in custom vocabulary models. In this paper, a custom vocabulary model is used,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing