On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and   Elderly Speech Recognition

Mengzhe Geng; Xurong Xie; Rongfeng Su; Jianwei Yu; Zengrui Jin; Tianzi; Wang; Shujie Hu; Zi Ye; Helen Meng; Xunying Liu

arXiv:2203.14593·eess.AS·May 30, 2023

On-the-Fly Feature Based Rapid Speaker Adaptation for Dysarthric and Elderly Speech Recognition

Mengzhe Geng, Xurong Xie, Rongfeng Su, Jianwei Yu, Zengrui Jin, Tianzi, Wang, Shujie Hu, Zi Ye, Helen Meng, Xunying Liu

PDF

Open Access

TL;DR

This paper introduces two novel, data-efficient, feature-based on-the-fly speaker adaptation methods that significantly improve speech recognition accuracy for dysarthric and elderly speakers, addressing challenges of speaker heterogeneity and data scarcity.

Contribution

The paper proposes two new on-the-fly speaker adaptation techniques, variance-regularized spectral basis embedding and spectral feature driven f-LHUC transforms, for improved recognition of diverse speech.

Findings

01

Significant WER reduction over baseline systems

02

Consistent outperformance of offline LHUC adaptation

03

Effective handling of speaker heterogeneity and data scarcity

Abstract

Accurate recognition of dysarthric and elderly speech remain challenging tasks to date. Speaker-level heterogeneity attributed to accent or gender, when aggregated with age and speech impairment, create large diversity among these speakers. Scarcity of speaker-level data limits the practical use of data-intensive model based speaker adaptation methods. To this end, this paper proposes two novel forms of data-efficient, feature-based on-the-fly speaker adaptation methods: variance-regularized spectral basis embedding (SVR) and spectral feature driven f-LHUC transforms. Experiments conducted on UASpeech dysarthric and DementiaBank Pitt elderly speech corpora suggest the proposed on-the-fly speaker adaptation approaches consistently outperform baseline iVector adapted hybrid DNN/TDNN and E2E Conformer systems by statistically significant WER reduction of 2.48%-2.85% absolute (7.92%-8.06%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Phonetics and Phonology Research