DiffDSR: Dysarthric Speech Reconstruction Using Latent Diffusion Model

Xueyuan Chen; Dongchao Yang; Wenxuan Wu; Minglin Wu; Jing Xu; Xixin Wu; Zhiyong Wu; Helen Meng

arXiv:2506.00350·cs.SD·June 3, 2025

DiffDSR: Dysarthric Speech Reconstruction Using Latent Diffusion Model

Xueyuan Chen, Dongchao Yang, Wenxuan Wu, Minglin Wu, Jing Xu, Xixin Wu, Zhiyong Wu, Helen Meng

PDF

Open Access

TL;DR

This paper introduces DiffDSR, a diffusion-based system that reconstructs dysarthric speech into clear, speaker-identifiable speech using a latent diffusion model and self-supervised learning, significantly improving intelligibility and speaker similarity.

Contribution

The paper presents a novel latent diffusion model for dysarthric speech reconstruction, integrating phoneme and speaker encoders with a diffusion generator for improved quality.

Findings

01

Enhanced speech intelligibility on UASpeech corpus

02

Improved speaker similarity in reconstructed speech

03

Outperforms existing dysarthric speech reconstruction methods

Abstract

Dysarthric speech reconstruction (DSR) aims to convert dysarthric speech into comprehensible speech while maintaining the speaker's identity. Despite significant advancements, existing methods often struggle with low speech intelligibility and poor speaker similarity. In this study, we introduce a novel diffusion-based DSR system that leverages a latent diffusion model to enhance the quality of speech reconstruction. Our model comprises: (i) a speech content encoder for phoneme embedding restoration via pre-trained self-supervised learning (SSL) speech foundation models; (ii) a speaker identity encoder for speaker-aware identity preservation by in-context learning mechanism; (iii) a diffusion-based speech generator to reconstruct the speech based on the restored phoneme embedding and preserved speaker identity. Through evaluations on the widely-used UASpeech corpus, our proposed model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Phonetics and Phonology Research