Speaker adaptation for Wav2vec2 based dysarthric ASR

Murali Karthick Baskar; Tim Herzig; Diana Nguyen; Mireia Diez; Tim; Polzehl; Luk\'a\v{s} Burget; Jan "Honza'' \v{C}ernock\'y

arXiv:2204.00770·cs.SD·April 5, 2022

Speaker adaptation for Wav2vec2 based dysarthric ASR

Murali Karthick Baskar, Tim Herzig, Diana Nguyen, Mireia Diez, Tim, Polzehl, Luk\'a\v{s} Burget, Jan "Honza'' \v{C}ernock\'y

PDF

Open Access

TL;DR

This paper introduces a simple, flexible speaker adaptation network for wav2vec2-based dysarthric speech recognition, improving performance across severity levels and domains by integrating speaker adaptive features during fine-tuning.

Contribution

It proposes a novel adaptation network for wav2vec2 that incorporates fMLLR features and xvectors during fine-tuning, enhancing dysarthric speech recognition performance.

Findings

01

Achieved 57.72% WER on high severity in UASpeech dataset.

02

Demonstrated consistent improvements across all impairment severity levels.

03

Validated approach on German dataset for cross-domain robustness.

Abstract

Dysarthric speech recognition has posed major challenges due to lack of training data and heavy mismatch in speaker characteristics. Recent ASR systems have benefited from readily available pretrained models such as wav2vec2 to improve the recognition performance. Speaker adaptation using fMLLR and xvectors have provided major gains for dysarthric speech with very little adaptation data. However, integration of wav2vec2 with fMLLR features or xvectors during wav2vec2 finetuning is yet to be explored. In this work, we propose a simple adaptation network for fine-tuning wav2vec2 using fMLLR features. The adaptation network is also flexible to handle other speaker adaptive features such as xvectors. Experimental analysis show steady improvements using our proposed approach across all impairment severity levels and attains 57.72\% WER for high severity in UASpeech dataset. We also performed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Phonetics and Phonology Research