Prototype-Based Disentanglement for Controllable Dysarthric Speech Synthesis

Haoshen Wang; Xueli Zhong; Bingbing Lin; Jia Huang; Xingduo Pan; Shengxiang Liang; Nizhuan Wang; Wai Ting Siok

arXiv:2602.08696·cs.SD·February 20, 2026

Prototype-Based Disentanglement for Controllable Dysarthric Speech Synthesis

Haoshen Wang, Xueli Zhong, Bingbing Lin, Jia Huang, Xingduo Pan, Shengxiang Liang, Nizhuan Wang, Wai Ting Siok

PDF

Open Access

TL;DR

This paper introduces ProtoDisent-TTS, a prototype-based TTS framework that disentangles speaker identity from dysarthric speech patterns, enabling controllable synthesis and improved robustness for dysarthric speech applications.

Contribution

It presents a novel prototype-based disentanglement method that separates speaker and pathological features in speech synthesis, enhancing controllability and robustness.

Findings

01

Enables bidirectional transformation between healthy and dysarthric speech.

02

Improves ASR performance on dysarthric speech.

03

Provides interpretable and controllable speech representations.

Abstract

Dysarthric speech exhibits high variability and limited labeled data, posing major challenges for both automatic speech recognition (ASR) and assistive speech technologies. Existing approaches rely on synthetic data augmentation or speech reconstruction, yet often entangle speaker identity with pathological articulation, limiting controllability and robustness. In this paper, we propose ProtoDisent-TTS, a prototype-based disentanglement TTS framework built on a pre-trained text-to-speech backbone that factorizes speaker timbre and dysarthric articulation within a unified latent space. A pathology prototype codebook provides interpretable and controllable representations of healthy and dysarthric speech patterns, while a dual-classifier objective with a gradient reversal layer enforces invariance of speaker embeddings to pathological attributes. Experiments on the TORGO dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Phonocardiography and Auscultation Techniques