A Self-Training Approach for Whisper to Enhance Long Dysarthric Speech Recognition
Shiyao Wang, Jiaming Zhou, Shiwan Zhao, Yong Qin

TL;DR
This paper introduces a self-training method to improve Whisper's ability to recognize long dysarthric speech, leveraging a large dataset and addressing incomplete speech segments, resulting in competitive performance in the SAP Challenge.
Contribution
A novel self-training approach that enhances Whisper's performance on long dysarthric speech recognition using a large, diverse dataset and handling incomplete speech segments.
Findings
Achieved second place in Word Error Rate in SAP Challenge
Improved recognition of long dysarthric speech segments
Enhanced model robustness to incomplete speech data
Abstract
Dysarthric speech recognition (DSR) enhances the accessibility of smart devices for dysarthric speakers with limited mobility. Previously, DSR research was constrained by the fact that existing datasets typically consisted of isolated words, command phrases, and a limited number of sentences spoken by a few individuals. This constrained research to command-interaction systems and speaker adaptation. The Speech Accessibility Project (SAP) changed this by releasing a large and diverse English dysarthric dataset, leading to the SAP Challenge to build speaker- and text-independent DSR systems. We enhanced the Whisper model's performance on long dysarthric speech via a novel self-training method. This method increased training data and adapted the model to handle potentially incomplete speech segments encountered during inference. Our system achieved second place in both Word Error Rate and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Phonocardiography and Auscultation Techniques
