Enhancing Pre-trained ASR System Fine-tuning for Dysarthric Speech   Recognition using Adversarial Data Augmentation

Huimeng Wang; Zengrui Jin; Mengzhe Geng; Shujie Hu; Guinan Li; Tianzi; Wang; Haoning Xu; Xunying Liu

arXiv:2401.00662·cs.SD·January 2, 2024·1 cites

Enhancing Pre-trained ASR System Fine-tuning for Dysarthric Speech Recognition using Adversarial Data Augmentation

Huimeng Wang, Zengrui Jin, Mengzhe Geng, Shujie Hu, Guinan Li, Tianzi, Wang, Haoning Xu, Xunying Liu

PDF

Open Access

TL;DR

This paper investigates various data augmentation techniques, especially GAN-based adversarial augmentation, to improve the fine-tuning of pre-trained ASR models for recognizing dysarthric speech, achieving significant WER reductions.

Contribution

It introduces a novel Spectral basis GAN-based adversarial data augmentation method for dysarthric speech recognition, outperforming traditional augmentation techniques.

Findings

01

GAN-based augmentation yields up to 2.01% WER reduction.

02

Spectral basis GAN improves robustness over conventional methods.

03

Achieved lowest published WER of 16.53% on UASpeech.

Abstract

Automatic recognition of dysarthric speech remains a highly challenging task to date. Neuro-motor conditions and co-occurring physical disabilities create difficulty in large-scale data collection for ASR system development. Adapting SSL pre-trained ASR models to limited dysarthric speech via data-intensive parameter fine-tuning leads to poor generalization. To this end, this paper presents an extensive comparative study of various data augmentation approaches to improve the robustness of pre-trained ASR model fine-tuning to dysarthric speech. These include: a) conventional speaker-independent perturbation of impaired speech; b) speaker-dependent speed perturbation, or GAN-based adversarial perturbation of normal, control speech based on their time alignment against parallel dysarthric speech; c) novel Spectral basis GAN-based adversarial data augmentation operating on non-parallel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Speech and Audio Processing