Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly   Speaker Adaptation

Mengzhe Geng; Xurong Xie; Jiajun Deng; Zengrui Jin; Guinan Li; Tianzi; Wang; Shujie Hu; Zhaoqing Li; Helen Meng; Xunying Liu

arXiv:2407.06310·cs.SD·July 10, 2024

Homogeneous Speaker Features for On-the-Fly Dysarthric and Elderly Speaker Adaptation

Mengzhe Geng, Xurong Xie, Jiajun Deng, Zengrui Jin, Guinan Li, Tianzi, Wang, Shujie Hu, Zhaoqing Li, Helen Meng, Xunying Liu

PDF

Open Access

TL;DR

This paper introduces two novel, data-efficient methods for rapid on-the-fly speaker adaptation in ASR systems targeting dysarthric and elderly speech, significantly improving accuracy and speed over existing techniques.

Contribution

It proposes VR-SBE features and f-LHUC transforms that enhance speaker homogeneity and adaptation efficiency for dysarthric and elderly speech recognition.

Findings

01

Achieved up to 5.32% absolute WER reduction over baseline methods.

02

Operates with real-time factors up to 33.6 times faster than xVector adaptation.

03

Demonstrated state-of-the-art WER of 23.33% on UASpeech.

Abstract

The application of data-intensive automatic speech recognition (ASR) technologies to dysarthric and elderly adult speech is confronted by their mismatch against healthy and nonaged voices, data scarcity and large speaker-level variability. To this end, this paper proposes two novel data-efficient methods to learn homogeneous dysarthric and elderly speaker-level features for rapid, on-the-fly test-time adaptation of DNN/TDNN and Conformer ASR models. These include: 1) speaker-level variance-regularized spectral basis embedding (VR-SBE) features that exploit a special regularization term to enforce homogeneity of speaker features in adaptation; and 2) feature-based learning hidden unit contributions (f-LHUC) transforms that are conditioned on VR-SBE features. Experiments are conducted on four tasks across two languages: the English UASpeech and TORGO dysarthric speech datasets, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders