Enhancing AAC Software for Dysarthric Speakers in e-Health Settings: An Evaluation Using TORGO
Macarious Hui, Jinda Zhang, Aanchan Mohan

TL;DR
This paper evaluates and improves automatic speech recognition for dysarthric speakers in healthcare, addressing dataset biases and leveraging advanced models to enhance communication tools for individuals with speech impairments.
Contribution
It introduces an algorithm to eliminate prompt-overlap in the TORGO dataset and assesses the impact of language models and LLM-based error correction on ASR performance for dysarthric speech.
Findings
Prompt-overlap significantly affects ASR evaluation.
State-of-the-art ASR models perform poorly on dysarthric speech.
LLM-based error correction improves recognition accuracy.
Abstract
Individuals with cerebral palsy (CP) and amyotrophic lateral sclerosis (ALS) frequently face challenges with articulation, leading to dysarthria and resulting in atypical speech patterns. In healthcare settings, communication breakdowns reduce the quality of care. While building an augmentative and alternative communication (AAC) tool to enable fluid communication we found that state-of-the-art (SOTA) automatic speech recognition (ASR) technology like Whisper and Wav2vec2.0 marginalizes atypical speakers largely due to the lack of training data. Our work looks to leverage SOTA ASR followed by domain specific error-correction. English dysarthric ASR performance is often evaluated on the TORGO dataset. Prompt-overlap is a well-known issue with this dataset where phrases overlap between training and test speakers. Our work proposes an algorithm to break this prompt-overlap. After reducing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAssistive Technology in Communication and Mobility · Speech and dialogue systems
