Self-supervised ASR Models and Features For Dysarthric and Elderly Speech Recognition
Shujie Hu, Xurong Xie, Mengzhe Geng, Zengrui Jin, Jiajun Deng, Guinan, Li, Yi Wang, Mingyu Cui, Tianzi Wang, Helen Meng, Xunying Liu

TL;DR
This paper investigates integrating domain fine-tuned self-supervised speech models into ASR systems to improve recognition accuracy for dysarthric and elderly speech, addressing data scarcity and mismatch issues.
Contribution
It introduces methods for combining SSL features with traditional ASR systems and demonstrates significant performance improvements across multiple dysarthric and elderly speech datasets.
Findings
Significant WER/CER reductions across all datasets
Improved Alzheimer's detection accuracy
Effective multi-modal ASR system development
Abstract
Self-supervised learning (SSL) based speech foundation models have been applied to a wide range of ASR tasks. However, their application to dysarthric and elderly speech via data-intensive parameter fine-tuning is confronted by in-domain data scarcity and mismatch. To this end, this paper explores a series of approaches to integrate domain fine-tuned SSL pre-trained models and their features into TDNN and Conformer ASR systems for dysarthric and elderly speech recognition. These include: a) input feature fusion between standard acoustic frontends and domain fine-tuned SSL speech representations; b) frame-level joint decoding between TDNN systems separately trained using standard acoustic features alone and those with additional domain fine-tuned SSL features; and c) multi-pass decoding involving the TDNN/Conformer system outputs to be rescored using domain fine-tuned pre-trained ASR…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders
MethodsXLSR
