Fine-Tuning Automatic Speech Recognition for People with Parkinson's: An   Effective Strategy for Enhancing Speech Technology Accessibility

Xiuwen Zheng; Bornali Phukon; Mark Hasegawa-Johnson

arXiv:2409.19818·eess.AS·October 1, 2024

Fine-Tuning Automatic Speech Recognition for People with Parkinson's: An Effective Strategy for Enhancing Speech Technology Accessibility

Xiuwen Zheng, Bornali Phukon, Mark Hasegawa-Johnson

PDF

TL;DR

This study demonstrates that fine-tuning pretrained ASR models with a multi-task learning approach on speech data from people with Parkinson's significantly improves recognition accuracy, making speech technology more accessible for this group.

Contribution

It introduces a multi-task learning fine-tuning method that incorporates impairment severity estimation, achieving substantial WER reductions for Parkinson's speech recognition.

Findings

01

Word error rate improved by over 26% to 37% compared to baseline models.

02

Multi-task learning with impairment severity estimation yields best results.

03

Fine-tuning on Parkinson's speech data enhances ASR accessibility for affected individuals.

Abstract

This paper enhances dysarthric and dysphonic speech recognition by fine-tuning pretrained automatic speech recognition (ASR) models on the 2023-10-05 data package of the Speech Accessibility Project (SAP), which contains the speech of 253 people with Parkinson's disease. Experiments tested methods that have been effective for Cerebral Palsy, including the use of speaker clustering and severity-dependent models, weighted fine-tuning, and multi-task learning. Best results were obtained using a multi-task learning model, in which the ASR is trained to produce an estimate of the speaker's impairment severity as an auxiliary output. The resulting word error rates are considerably improved relative to a baseline model fine-tuned using only Librispeech data, with word error rate improvements of 37.62\% and 26.97\% compared to fine-tuning on 100h and 960h of LibriSpeech data, respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.