Spectro-Temporal Deep Features for Disordered Speech Assessment and   Recognition

Mengzhe Geng; Shansong Liu; Jianwei Yu; Xurong Xie; Shoukang Hu; Zi; Ye; Zengrui Jin; Xunying Liu; Helen Meng

arXiv:2201.05554·cs.SD·January 20, 2022

Spectro-Temporal Deep Features for Disordered Speech Assessment and Recognition

Mengzhe Geng, Shansong Liu, Jianwei Yu, Xurong Xie, Shoukang Hu, Zi, Ye, Zengrui Jin, Xunying Liu, Helen Meng

PDF

TL;DR

This paper introduces spectro-temporal deep features derived from SVD decomposition of speech spectra to improve disordered speech recognition and assessment, demonstrating significant WER reductions on the UASpeech corpus.

Contribution

It proposes novel spectro-temporal subspace basis embedding deep features for disordered speech recognition and speaker adaptation, outperforming traditional i-Vector methods.

Findings

01

Achieved up to 8.6% relative WER reduction over baseline

02

Consistent improvements with data augmentation and LHUC adaptation

03

Final system attained 25.6% WER on UASpeech test set

Abstract

Automatic recognition of disordered speech remains a highly challenging task to date. Sources of variability commonly found in normal speech including accent, age or gender, when further compounded with the underlying causes of speech impairment and varying severity levels, create large diversity among speakers. To this end, speaker adaptation techniques play a vital role in current speech recognition systems. Motivated by the spectro-temporal level differences between disordered and normal speech that systematically manifest in articulatory imprecision, decreased volume and clarity, slower speaking rates and increased dysfluencies, novel spectro-temporal subspace basis embedding deep features derived by SVD decomposition of speech spectrum are proposed to facilitate both accurate speech intelligibility assessment and auxiliary feature based speaker adaptation of state-of-the-art hybrid…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.