Improved Dysarthric Speech to Text Conversion via TTS Personalization

P\'eter Mihajlik; \'Eva Sz\'ekely; Piroska Barta; M\'at\'e Soma K\'ad\'ar; Gergely Dobsinszki; L\'aszl\'o T\'oth

arXiv:2508.06391·cs.SD·August 11, 2025

Improved Dysarthric Speech to Text Conversion via TTS Personalization

P\'eter Mihajlik, \'Eva Sz\'ekely, Piroska Barta, M\'at\'e Soma K\'ad\'ar, Gergely Dobsinszki, L\'aszl\'o T\'oth

PDF

Open Access

TL;DR

This paper demonstrates that fine-tuning a speech-to-text model with synthetic and real dysarthric speech significantly improves transcription accuracy for individuals with severe speech impairments.

Contribution

The study introduces a novel method for generating synthetic dysarthric speech with controlled severity for personalized ASR fine-tuning.

Findings

01

CER reduced from 36-51% to 7.3% after fine-tuning

02

Synthetic speech inclusion yields 18% relative CER reduction

03

Personalized models outperform general models like Whisper-turbo

Abstract

We present a case study on developing a customized speech-to-text system for a Hungarian speaker with severe dysarthria. State-of-the-art automatic speech recognition (ASR) models struggle with zero-shot transcription of dysarthric speech, yielding high error rates. To improve performance with limited real dysarthric data, we fine-tune an ASR model using synthetic speech generated via a personalized text-to-speech (TTS) system. We introduce a method for generating synthetic dysarthric speech with controlled severity by leveraging premorbidity recordings of the given speaker and speaker embedding interpolation, enabling ASR fine-tuning on a continuum of impairments. Fine-tuning on both real and synthetic dysarthric speech reduces the character error rate (CER) from 36-51% (zero-shot) to 7.3%. Our monolingual FastConformer_Hu ASR model significantly outperforms Whisper-turbo when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Phonetics and Phonology Research