Structured Speaker-Deficiency Adaptation of Foundation Models for   Dysarthric and Elderly Speech Recognition

Shujie Hu; Xurong Xie; Mengzhe Geng; Jiajun Deng; Zengrui Jin; Tianzi; Wang; Mingyu Cui; Guinan Li; Zhaoqing Li; Helen Meng; Xunying Liu

arXiv:2412.18832·eess.AS·December 30, 2024

Structured Speaker-Deficiency Adaptation of Foundation Models for Dysarthric and Elderly Speech Recognition

Shujie Hu, Xurong Xie, Mengzhe Geng, Jiajun Deng, Zengrui Jin, Tianzi, Wang, Mingyu Cui, Guinan Li, Zhaoqing Li, Helen Meng, Xunying Liu

PDF

Open Access

TL;DR

This paper introduces structured speaker-deficiency adaptation methods for speech foundation models, significantly improving recognition accuracy for dysarthric and elderly speech by reducing bias and modeling variability.

Contribution

It proposes novel adaptive fine-tuning techniques using separate adapters for speaker and deficiency attributes, enhancing robustness and generalization of speech models.

Findings

01

Consistent WER reductions up to 3.01% absolute on dysarthric speech

02

Achieved lowest published WER of 19.45% on UASpeech

03

Models outperform baselines with no adapters or shared adapters

Abstract

Data-intensive fine-tuning of speech foundation models (SFMs) to scarce and diverse dysarthric and elderly speech leads to data bias and poor generalization to unseen speakers. This paper proposes novel structured speaker-deficiency adaptation approaches for SSL pre-trained SFMs on such data. Speaker and speech deficiency invariant SFMs were constructed in their supervised adaptive fine-tuning stage to reduce undue bias to training data speakers, and serves as a more neutral and robust starting point for test time unsupervised adaptation. Speech variability attributed to speaker identity and speech impairment severity, or aging induced neurocognitive decline, are modelled using separate adapters that can be combined together to model any seen or unseen speaker. Experiments on the UASpeech dysarthric and DementiaBank Pitt elderly speech corpora suggest structured speaker-deficiency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders

MethodsSparse Evolutionary Training