Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric   and Elderly Speech Recognition

Mengzhe Geng; Xurong Xie; Zi Ye; Tianzi Wang; Guinan Li; Shujie Hu,; Xunying Liu; Helen Meng

arXiv:2202.10290·eess.AS·March 18, 2022

Speaker Adaptation Using Spectro-Temporal Deep Features for Dysarthric and Elderly Speech Recognition

Mengzhe Geng, Xurong Xie, Zi Ye, Tianzi Wang, Guinan Li, Shujie Hu,, Xunying Liu, Helen Meng

PDF

Open Access

TL;DR

This paper introduces spectro-temporal deep features derived from SVD spectrum decomposition to improve speaker adaptation in speech recognition systems for dysarthric and elderly speech, achieving significant WER reductions.

Contribution

It proposes a novel spectro-temporal deep embedding feature for speaker adaptation, outperforming traditional i-Vector and xVector methods in challenging speech recognition tasks.

Findings

01

Up to 2.63% absolute WER reduction over baseline methods.

02

Consistent improvements with additional model-based adaptation techniques.

03

Lowest published WER of 25.05% on UASpeech dysarthric speech dataset.

Abstract

Despite the rapid progress of automatic speech recognition (ASR) technologies targeting normal speech in recent decades, accurate recognition of dysarthric and elderly speech remains highly challenging tasks to date. Sources of heterogeneity commonly found in normal speech including accent or gender, when further compounded with the variability over age and speech pathology severity level, create large diversity among speakers. To this end, speaker adaptation techniques play a key role in personalization of ASR systems for such users. Motivated by the spectro-temporal level differences between dysarthric, elderly and normal speech that systematically manifest in articulatory imprecision, decreased volume and clarity, slower speaking rates and increased dysfluencies, novel spectrotemporal subspace basis deep embedding features derived using SVD speech spectrum decomposition are proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Dysphagia Assessment and Management · Phonetics and Phonology Research