A Self-Training Approach for Whisper to Enhance Long Dysarthric Speech Recognition

Shiyao Wang; Jiaming Zhou; Shiwan Zhao; Yong Qin

arXiv:2506.22810·cs.SD·July 1, 2025

A Self-Training Approach for Whisper to Enhance Long Dysarthric Speech Recognition

Shiyao Wang, Jiaming Zhou, Shiwan Zhao, Yong Qin

PDF

Open Access

TL;DR

This paper introduces a self-training method to improve Whisper's ability to recognize long dysarthric speech, leveraging a large dataset and addressing incomplete speech segments, resulting in competitive performance in the SAP Challenge.

Contribution

A novel self-training approach that enhances Whisper's performance on long dysarthric speech recognition using a large, diverse dataset and handling incomplete speech segments.

Findings

01

Achieved second place in Word Error Rate in SAP Challenge

02

Improved recognition of long dysarthric speech segments

03

Enhanced model robustness to incomplete speech data

Abstract

Dysarthric speech recognition (DSR) enhances the accessibility of smart devices for dysarthric speakers with limited mobility. Previously, DSR research was constrained by the fact that existing datasets typically consisted of isolated words, command phrases, and a limited number of sentences spoken by a few individuals. This constrained research to command-interaction systems and speaker adaptation. The Speech Accessibility Project (SAP) changed this by releasing a large and diverse English dysarthric dataset, leading to the SAP Challenge to build speaker- and text-independent DSR systems. We enhanced the Whisper model's performance on long dysarthric speech via a novel self-training method. This method increased training data and adapted the model to handle potentially incomplete speech segments encountered during inference. Our system achieved second place in both Word Error Rate and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Phonocardiography and Auscultation Techniques