On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition
Mengzhe Geng, Xurong Xie, Rongfeng Su, Jianwei Yu, Zengrui Jin, Tianzi, Wang, Shujie Hu, Zi Ye, Helen Meng, Xunying Liu

TL;DR
This paper introduces two novel, data-efficient, feature-based on-the-fly speaker adaptation methods that significantly improve speech recognition accuracy for dysarthric and elderly speakers, addressing challenges of speaker heterogeneity and data scarcity.
Contribution
The paper proposes two new on-the-fly speaker adaptation techniques, variance-regularized spectral basis embedding and spectral feature driven f-LHUC transforms, for improved recognition of diverse speech.
Findings
Significant WER reduction over baseline systems
Consistent outperformance of offline LHUC adaptation
Effective handling of speaker heterogeneity and data scarcity
Abstract
Accurate recognition of dysarthric and elderly speech remain challenging tasks to date. Speaker-level heterogeneity attributed to accent or gender, when aggregated with age and speech impairment, create large diversity among these speakers. Scarcity of speaker-level data limits the practical use of data-intensive model based speaker adaptation methods. To this end, this paper proposes two novel forms of data-efficient, feature-based on-the-fly speaker adaptation methods: variance-regularized spectral basis embedding (SVR) and spectral feature driven f-LHUC transforms. Experiments conducted on UASpeech dysarthric and DementiaBank Pitt elderly speech corpora suggest the proposed on-the-fly speaker adaptation approaches consistently outperform baseline iVector adapted hybrid DNN/TDNN and E2E Conformer systems by statistically significant WER reduction of 2.48%-2.85% absolute (7.92%-8.06%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Phonetics and Phonology Research
