Can we reconstruct a dysarthric voice with the large speech model Parler TTS?

Ariadna Sanchez; Simon King

arXiv:2506.04397·eess.AS·September 26, 2025

Can we reconstruct a dysarthric voice with the large speech model Parler TTS?

Ariadna Sanchez, Simon King

PDF

TL;DR

This paper explores using the large speech model Parler TTS to reconstruct dysarthric voices, aiming to generate intelligible speech that preserves speaker identity, with initial success but notable challenges in control and consistency.

Contribution

It demonstrates the potential of fine-tuning a large speech model for dysarthric voice reconstruction and highlights key challenges in intelligibility and speaker consistency.

Findings

01

Model can learn from challenging dysarthric data

02

Struggles with controlling intelligibility

03

Has difficulty maintaining speaker identity

Abstract

Speech disorders can make communication hard or even impossible for those who develop them. Personalised Text-to-Speech is an attractive option as a communication aid. We attempt voice reconstruction using a large speech model, with which we generate an approximation of a dysarthric speaker's voice prior to the onset of their condition. In particular, we investigate whether a state-of-the-art large speech model, Parler TTS, can generate intelligible speech while maintaining speaker identity. We curate a dataset and annotate it with relevant speaker and intelligibility information, and use this to fine-tune the model. Our results show that the model can indeed learn to generate from the distribution of this challenging data, but struggles to control intelligibility and to maintain consistent speaker identity. We propose future directions to improve controllability of this class of model,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.