Analysis and Tuning of a Voice Assistant System for Dysfluent Speech

Vikramjit Mitra; Zifang Huang; Colin Lea; Lauren Tooley; Sarah Wu,; Darren Botten; Ashwini Palekar; Shrinath Thelapurath; Panayiotis Georgiou,; Sachin Kajarekar; Jefferey Bigham

arXiv:2106.11759·eess.AS·June 23, 2021

Analysis and Tuning of a Voice Assistant System for Dysfluent Speech

Vikramjit Mitra, Zifang Huang, Colin Lea, Lauren Tooley, Sarah Wu,, Darren Botten, Ashwini Palekar, Shrinath Thelapurath, Panayiotis Georgiou,, Sachin Kajarekar, Jefferey Bigham

PDF

TL;DR

This paper analyzes the performance of a consumer speech recognition system for individuals with speech disorders and demonstrates that simple tuning of decoding parameters can significantly improve recognition accuracy for dysfluent speech.

Contribution

It provides a quantitative analysis of recognition errors for dysfluent speech and shows that tuning decoding parameters enhances performance for voice assistant tasks.

Findings

01

Baseline isWER for dysfluent speech is 13.64% worse.

02

Tuning decoding parameters improves isWER by 24% (relative).

03

Domain and intent recognition also improve slightly after tuning.

Abstract

Dysfluencies and variations in speech pronunciation can severely degrade speech recognition performance, and for many individuals with moderate-to-severe speech disorders, voice operated systems do not work. Current speech recognition systems are trained primarily with data from fluent speakers and as a consequence do not generalize well to speech with dysfluencies such as sound or word repetitions, sound prolongations, or audible blocks. The focus of this work is on quantitative analysis of a consumer speech recognition system on individuals who stutter and production-oriented approaches for improving performance for common voice assistant tasks (i.e., "what is the weather?"). At baseline, this system introduces a significant number of insertion and substitution errors resulting in intended speech Word Error Rates (isWER) that are 13.64\% worse (absolute) for individuals with fluency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.