Unsupervised Rhythm and Voice Conversion of Dysarthric to Healthy Speech   for ASR

Karl El Hajal; Enno Hermann; Ajinkya Kulkarni; Mathew Magimai.-Doss

arXiv:2501.10256·eess.AS·January 20, 2025

Unsupervised Rhythm and Voice Conversion of Dysarthric to Healthy Speech for ASR

Karl El Hajal, Enno Hermann, Ajinkya Kulkarni, Mathew Magimai.-Doss

PDF

Open Access

TL;DR

This paper introduces an unsupervised method for converting dysarthric speech to healthy speech using self-supervised representations, significantly improving ASR performance without requiring transcribed data for unseen speakers.

Contribution

It proposes a novel unsupervised rhythm and voice conversion approach that does not depend on transcribed data, enhancing ASR accuracy for dysarthric speech.

Findings

01

Rhythm conversion improves ASR accuracy for severe dysarthria

02

Unsupervised methods outperform rate modification approaches

03

Effective on large pre-trained ASR models

Abstract

Automatic speech recognition (ASR) systems are well known to perform poorly on dysarthric speech. Previous works have addressed this by speaking rate modification to reduce the mismatch with typical speech. Unfortunately, these approaches rely on transcribed speech data to estimate speaking rates and phoneme durations, which might not be available for unseen speakers. Therefore, we combine unsupervised rhythm and voice conversion methods based on self-supervised speech representations to map dysarthric to typical speech. We evaluate the outputs with a large ASR model pre-trained on healthy speech without further fine-tuning and find that the proposed rhythm conversion especially improves performance for speakers of the Torgo corpus with more severe cases of dysarthria. Code and audio samples are available at https://idiap.github.io/RnV .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Phonetics and Phonology Research