Facilitating Personalized TTS for Dysarthric Speakers Using Knowledge Anchoring and Curriculum Learning
Yejin Jeon, Solee Im, Youngjae Kim, Gary Geunbae Lee

TL;DR
This paper introduces a novel knowledge anchoring and curriculum learning approach to improve personalized text-to-speech synthesis for dysarthric speakers, overcoming data scarcity and articulation challenges.
Contribution
It presents a zero-shot multi-speaker TTS model that leverages a teacher-student framework with curriculum learning to enhance speech quality for dysarthric speakers.
Findings
Significantly reduces articulation errors in synthetic speech.
Achieves high speaker fidelity and prosodic naturalness.
Effective in low-data, zero-shot scenarios.
Abstract
Dysarthric speakers experience substantial communication challenges due to impaired motor control of the speech apparatus, which leads to reduced speech intelligibility. This creates significant obstacles in dataset curation since actual recording of long, articulate sentences for the objective of training personalized TTS models becomes infeasible. Thus, the limited availability of audio data, in addition to the articulation errors that are present within the audio, complicates personalized speech synthesis for target dysarthric speaker adaptation. To address this, we frame the issue as a domain transfer task and introduce a knowledge anchoring framework that leverages a teacher-student model, enhanced by curriculum learning through audio augmentation. Experimental results show that the proposed zero-shot multi-speaker TTS model effectively generates synthetic speech with markedly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
