Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and   Dysarthric Speech Recognition

Tianzi Wang; Shoukang Hu; Jiajun Deng; Zengrui Jin; Mengzhe Geng; Yi; Wang; Helen Meng; Xunying Liu

arXiv:2306.15265·eess.AS·June 28, 2023

Hyper-parameter Adaptation of Conformer ASR Systems for Elderly and Dysarthric Speech Recognition

Tianzi Wang, Shoukang Hu, Jiajun Deng, Zengrui Jin, Mengzhe Geng, Yi, Wang, Helen Meng, Xunying Liu

PDF

Open Access

TL;DR

This paper explores hyper-parameter adaptation for Conformer ASR systems pre-trained on general speech data, improving recognition accuracy for elderly and dysarthric speech through domain-specific tuning.

Contribution

It introduces hyper-parameter adaptation techniques for Conformer ASR models, demonstrating improved performance on elderly and dysarthric speech datasets beyond standard fine-tuning.

Findings

01

Hyper-parameter adaptation reduces WER by 0.45% on DementiaBank.

02

Hyper-parameter adaptation reduces WER by 0.67% on UASpeech.

03

Performance improvements correlate with utterance length ratios.

Abstract

Automatic recognition of disordered and elderly speech remains highly challenging tasks to date due to data scarcity. Parameter fine-tuning is often used to exploit the large quantities of non-aged and healthy speech pre-trained models, while neural architecture hyper-parameters are set using expert knowledge and remain unchanged. This paper investigates hyper-parameter adaptation for Conformer ASR systems that are pre-trained on the Librispeech corpus before being domain adapted to the DementiaBank elderly and UASpeech dysarthric speech datasets. Experimental results suggest that hyper-parameter adaptation produced word error rate (WER) reductions of 0.45% and 0.67% over parameter-only fine-tuning on DBank and UASpeech tasks respectively. An intuitive correlation is found between the performance improvements by hyper-parameter domain adaptation and the relative utterance length ratio…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Phonetics and Phonology Research