Improved Intelligibility of Dysarthric Speech using Conditional Flow Matching

Shoutrik Das; Nishant Singh; Arjun Gangwar; S Umesh

arXiv:2506.16127·cs.SD·June 23, 2025

Improved Intelligibility of Dysarthric Speech using Conditional Flow Matching

Shoutrik Das, Nishant Singh, Arjun Gangwar, S Umesh

PDF

Open Access

TL;DR

This paper introduces a non-autoregressive speech conversion method using Conditional Flow Matching with Diffusion Transformers, improving intelligibility of dysarthric speech by leveraging SSL features and discrete acoustic units.

Contribution

It proposes a novel non-autoregressive approach with CFM and Diffusion Transformers for dysarthric speech conversion, enhancing intelligibility and convergence speed.

Findings

01

Discrete acoustic units improve speech intelligibility

02

Faster convergence than mel-spectrogram-based methods

03

Effective speaker variability mitigation

Abstract

Dysarthria is a neurological disorder that significantly impairs speech intelligibility, often rendering affected individuals unable to communicate effectively. This necessitates the development of robust dysarthric-to-regular speech conversion techniques. In this work, we investigate the utility and limitations of self-supervised learning (SSL) features and their quantized representations as an alternative to mel-spectrograms for speech generation. Additionally, we explore methods to mitigate speaker variability by generating clean speech in a single-speaker voice using features extracted from WavLM. To this end, we propose a fully non-autoregressive approach that leverages Conditional Flow Matching (CFM) with Diffusion Transformers to learn a direct mapping from dysarthric to clean speech. Our findings highlight the effectiveness of discrete acoustic units in improving intelligibility…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Phonocardiography and Auscultation Techniques

MethodsDiffusion