Recent Progress in the CUHK Dysarthric Speech Recognition System

Shansong Liu; Mengzhe Geng; Shoukang Hu; Xurong Xie; Mingyu Cui,; Jianwei Yu; Xunying Liu; Helen Meng

arXiv:2201.05845·eess.AS·March 1, 2022

Recent Progress in the CUHK Dysarthric Speech Recognition System

Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui,, Jianwei Yu, Xunying Liu, Helen Meng

PDF

TL;DR

This paper reviews recent advances at CUHK in dysarthric speech recognition, employing novel neural and data augmentation techniques to significantly improve word error rates on challenging disordered speech datasets.

Contribution

Introduces innovative modeling methods including neural architecture search, data augmentation, and audio-visual integration, achieving state-of-the-art results in dysarthric speech recognition.

Findings

01

Lowest published WER of 25.21% on UASpeech test set

02

5.4% absolute WER reduction over previous systems

03

Rapid speaker adaptation with only 3.06 seconds of speech

Abstract

Despite the rapid progress of automatic speech recognition (ASR) technologies in the past few decades, recognition of disordered speech remains a highly challenging task to date. Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based ASR technologies that predominantly target normal speech. This paper presents recent research efforts at the Chinese University of Hong Kong (CUHK) to improve the performance of disordered speech recognition systems on the largest publicly available UASpeech dysarthric speech corpus. A set of novel modelling techniques including neural architectural search, data augmentation using spectra-temporal perturbation, model based speaker adaptation and cross-domain generation of visual features within an audio-visual speech recognition (AVSR) system framework were employed to address the above…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.