Robust Cross-Etiology and Speaker-Independent Dysarthric Speech   Recognition

Satwinder Singh; Qianli Wang; Zihan Zhong; Clarion Mendes; Mark; Hasegawa-Johnson; Waleed Abdulla; Seyed Reza Shahamiri

arXiv:2501.14994·cs.SD·January 28, 2025

Robust Cross-Etiology and Speaker-Independent Dysarthric Speech Recognition

Satwinder Singh, Qianli Wang, Zihan Zhong, Clarion Mendes, Mark, Hasegawa-Johnson, Waleed Abdulla, Seyed Reza Shahamiri

PDF

Open Access

TL;DR

This paper introduces a robust speaker-independent dysarthric speech recognition system that generalizes across different speakers and etiologies, achieving low error rates on multiple datasets using the Whisper model.

Contribution

Developed a novel speaker-independent dysarthric speech recognition system that performs well across different speakers and etiologies, leveraging the Whisper model for improved robustness.

Findings

01

Achieved CER of 6.99% and WER of 10.71% on SAP-1005 dataset.

02

Achieved CER of 25.08% and WER of 39.56% on TORGO dataset.

03

Demonstrated cross-etiology generalization across PD, CP, and ALS.

Abstract

In this paper, we present a speaker-independent dysarthric speech recognition system, with a focus on evaluating the recently released Speech Accessibility Project (SAP-1005) dataset, which includes speech data from individuals with Parkinson's disease (PD). Despite the growing body of research in dysarthric speech recognition, many existing systems are speaker-dependent and adaptive, limiting their generalizability across different speakers and etiologies. Our primary objective is to develop a robust speaker-independent model capable of accurately recognizing dysarthric speech, irrespective of the speaker. Additionally, as a secondary objective, we aim to test the cross-etiology performance of our model by evaluating it on the TORGO dataset, which contains speech samples from individuals with cerebral palsy (CP) and amyotrophic lateral sclerosis (ALS). By leveraging the Whisper model,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Phonetics and Phonology Research

MethodsFocus