Enhancing Pre-trained ASR System Fine-tuning for Dysarthric Speech Recognition using Adversarial Data Augmentation
Huimeng Wang, Zengrui Jin, Mengzhe Geng, Shujie Hu, Guinan Li, Tianzi, Wang, Haoning Xu, Xunying Liu

TL;DR
This paper investigates various data augmentation techniques, especially GAN-based adversarial augmentation, to improve the fine-tuning of pre-trained ASR models for recognizing dysarthric speech, achieving significant WER reductions.
Contribution
It introduces a novel Spectral basis GAN-based adversarial data augmentation method for dysarthric speech recognition, outperforming traditional augmentation techniques.
Findings
GAN-based augmentation yields up to 2.01% WER reduction.
Spectral basis GAN improves robustness over conventional methods.
Achieved lowest published WER of 16.53% on UASpeech.
Abstract
Automatic recognition of dysarthric speech remains a highly challenging task to date. Neuro-motor conditions and co-occurring physical disabilities create difficulty in large-scale data collection for ASR system development. Adapting SSL pre-trained ASR models to limited dysarthric speech via data-intensive parameter fine-tuning leads to poor generalization. To this end, this paper presents an extensive comparative study of various data augmentation approaches to improve the robustness of pre-trained ASR model fine-tuning to dysarthric speech. These include: a) conventional speaker-independent perturbation of impaired speech; b) speaker-dependent speed perturbation, or GAN-based adversarial perturbation of normal, control speech based on their time alignment against parallel dysarthric speech; c) novel Spectral basis GAN-based adversarial data augmentation operating on non-parallel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Speech and Audio Processing
