Recent Progress in the CUHK Dysarthric Speech Recognition System
Shansong Liu, Mengzhe Geng, Shoukang Hu, Xurong Xie, Mingyu Cui,, Jianwei Yu, Xunying Liu, Helen Meng

TL;DR
This paper reviews recent advances at CUHK in dysarthric speech recognition, employing novel neural and data augmentation techniques to significantly improve word error rates on challenging disordered speech datasets.
Contribution
Introduces innovative modeling methods including neural architecture search, data augmentation, and audio-visual integration, achieving state-of-the-art results in dysarthric speech recognition.
Findings
Lowest published WER of 25.21% on UASpeech test set
5.4% absolute WER reduction over previous systems
Rapid speaker adaptation with only 3.06 seconds of speech
Abstract
Despite the rapid progress of automatic speech recognition (ASR) technologies in the past few decades, recognition of disordered speech remains a highly challenging task to date. Disordered speech presents a wide spectrum of challenges to current data intensive deep neural networks (DNNs) based ASR technologies that predominantly target normal speech. This paper presents recent research efforts at the Chinese University of Hong Kong (CUHK) to improve the performance of disordered speech recognition systems on the largest publicly available UASpeech dysarthric speech corpus. A set of novel modelling techniques including neural architectural search, data augmentation using spectra-temporal perturbation, model based speaker adaptation and cross-domain generation of visual features within an audio-visual speech recognition (AVSR) system framework were employed to address the above…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
