Improved End-to-End Dysarthric Speech Recognition via Meta-learning   Based Model Re-initialization

Disong Wang; Jianwei Yu; Xixin Wu; Lifa Sun; Xunying Liu; Helen Meng

arXiv:2011.01686·eess.AS·November 4, 2020·ISCSLP·6 cites

Improved End-to-End Dysarthric Speech Recognition via Meta-learning Based Model Re-initialization

Disong Wang, Jianwei Yu, Xixin Wu, Lifa Sun, Xunying Liu, Helen Meng

PDF

Open Access

TL;DR

This paper introduces a meta-learning based re-initialization method for end-to-end dysarthric speech recognition models, significantly improving adaptation and recognition accuracy for limited and acoustically deviant dysarthric speech data.

Contribution

It extends MAML and Reptile algorithms to better initialize models for dysarthric speech recognition, enabling faster adaptation and improved performance.

Findings

01

Achieved 54.2% relative WER reduction over the base model.

02

Outperformed direct fine-tuning methods.

03

Comparable to state-of-the-art hybrid models.

Abstract

Dysarthric speech recognition is a challenging task as dysarthric data is limited and its acoustics deviate significantly from normal speech. Model-based speaker adaptation is a promising method by using the limited dysarthric speech to fine-tune a base model that has been pre-trained from large amounts of normal speech to obtain speaker-dependent models. However, statistic distribution mismatches between the normal and dysarthric speech data limit the adaptation performance of the base model. To address this problem, we propose to re-initialize the base model via meta-learning to obtain a better model initialization. Specifically, we focus on end-to-end models and extend the model-agnostic meta learning (MAML) and Reptile algorithms to meta update the base model by repeatedly simulating adaptation to different dysarthric speakers. As a result, the re-initialized model acquires…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Music and Audio Processing