Cross-Learning Fine-Tuning Strategy for Dysarthric Speech Recognition Via CDSD database

Qing Xiao; Yingshan Peng; PeiPei Zhang

arXiv:2508.18732·cs.SD·August 27, 2025

Cross-Learning Fine-Tuning Strategy for Dysarthric Speech Recognition Via CDSD database

Qing Xiao, Yingshan Peng, PeiPei Zhang

PDF

TL;DR

This paper introduces a multi-speaker fine-tuning approach for dysarthric speech recognition that improves accuracy and generalization by leveraging broader pathological features, outperforming traditional single-speaker methods.

Contribution

It proposes a novel cross-learning fine-tuning strategy that enhances dysarthric speech recognition by training on multiple speakers simultaneously, reducing data dependence and overfitting.

Findings

01

Up to 13.15% lower WER compared to single-speaker fine-tuning.

02

Multi-speaker fine-tuning improves generalization and target-speaker accuracy.

03

The approach mitigates speaker-specific overfitting and reduces data requirements.

Abstract

Dysarthric speech recognition faces challenges from severity variations and disparities relative to normal speech. Conventional approaches individually fine-tune ASR models pre-trained on normal speech per patient to prevent feature conflicts. Counter-intuitively, experiments reveal that multi-speaker fine-tuning (simultaneously on multiple dysarthric speakers) improves recognition of individual speech patterns. This strategy enhances generalization via broader pathological feature learning, mitigates speaker-specific overfitting, reduces per-patient data dependence, and improves target-speaker accuracy - achieving up to 13.15% lower WER versus single-speaker fine-tuning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.