Analysis and Tuning of a Voice Assistant System for Dysfluent Speech
Vikramjit Mitra, Zifang Huang, Colin Lea, Lauren Tooley, Sarah Wu,, Darren Botten, Ashwini Palekar, Shrinath Thelapurath, Panayiotis Georgiou,, Sachin Kajarekar, Jefferey Bigham

TL;DR
This paper analyzes the performance of a consumer speech recognition system for individuals with speech disorders and demonstrates that simple tuning of decoding parameters can significantly improve recognition accuracy for dysfluent speech.
Contribution
It provides a quantitative analysis of recognition errors for dysfluent speech and shows that tuning decoding parameters enhances performance for voice assistant tasks.
Findings
Baseline isWER for dysfluent speech is 13.64% worse.
Tuning decoding parameters improves isWER by 24% (relative).
Domain and intent recognition also improve slightly after tuning.
Abstract
Dysfluencies and variations in speech pronunciation can severely degrade speech recognition performance, and for many individuals with moderate-to-severe speech disorders, voice operated systems do not work. Current speech recognition systems are trained primarily with data from fluent speakers and as a consequence do not generalize well to speech with dysfluencies such as sound or word repetitions, sound prolongations, or audible blocks. The focus of this work is on quantitative analysis of a consumer speech recognition system on individuals who stutter and production-oriented approaches for improving performance for common voice assistant tasks (i.e., "what is the weather?"). At baseline, this system introduces a significant number of insertion and substitution errors resulting in intended speech Word Error Rates (isWER) that are 13.64\% worse (absolute) for individuals with fluency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
