Unsupervised Rhythm and Voice Conversion to Improve ASR on Dysarthric Speech

Karl El Hajal; Enno Hermann; Sevada Hovsepyan; Mathew Magimai.-Doss

arXiv:2506.01618·eess.AS·June 3, 2025

Unsupervised Rhythm and Voice Conversion to Improve ASR on Dysarthric Speech

Karl El Hajal, Enno Hermann, Sevada Hovsepyan, Mathew Magimai.-Doss

PDF

Open Access 1 Repo

TL;DR

This paper proposes an unsupervised rhythm and voice conversion method to enhance automatic speech recognition accuracy on dysarthric speech by reducing variability and improving model training.

Contribution

It introduces a syllable-based rhythm modeling extension to the RnV framework specifically for dysarthric speech, improving ASR performance.

Findings

01

LF-MMI models show significant word error rate reductions.

02

Fine-tuning Whisper on converted speech has minimal impact.

03

Results are especially positive for severe dysarthria cases.

Abstract

Automatic speech recognition (ASR) systems struggle with dysarthric speech due to high inter-speaker variability and slow speaking rates. To address this, we explore dysarthric-to-healthy speech conversion for improved ASR performance. Our approach extends the Rhythm and Voice (RnV) conversion framework by introducing a syllable-based rhythm modeling method suited for dysarthric speech. We assess its impact on ASR by training LF-MMI models and fine-tuning Whisper on converted speech. Experiments on the Torgo corpus reveal that LF-MMI achieves significant word error rate reductions, especially for more severe cases of dysarthria, while fine-tuning Whisper on converted data has minimal effect on its performance. These results highlight the potential of unsupervised rhythm and voice conversion for dysarthric ASR. Code available at: https://github.com/idiap/RnV

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

idiap/rnv
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhonetics and Phonology Research · Speech Recognition and Synthesis · Voice and Speech Disorders