Improved Intelligibility of Dysarthric Speech using Conditional Flow Matching
Shoutrik Das, Nishant Singh, Arjun Gangwar, S Umesh

TL;DR
This paper introduces a non-autoregressive speech conversion method using Conditional Flow Matching with Diffusion Transformers, improving intelligibility of dysarthric speech by leveraging SSL features and discrete acoustic units.
Contribution
It proposes a novel non-autoregressive approach with CFM and Diffusion Transformers for dysarthric speech conversion, enhancing intelligibility and convergence speed.
Findings
Discrete acoustic units improve speech intelligibility
Faster convergence than mel-spectrogram-based methods
Effective speaker variability mitigation
Abstract
Dysarthria is a neurological disorder that significantly impairs speech intelligibility, often rendering affected individuals unable to communicate effectively. This necessitates the development of robust dysarthric-to-regular speech conversion techniques. In this work, we investigate the utility and limitations of self-supervised learning (SSL) features and their quantized representations as an alternative to mel-spectrograms for speech generation. Additionally, we explore methods to mitigate speaker variability by generating clean speech in a single-speaker voice using features extracted from WavLM. To this end, we propose a fully non-autoregressive approach that leverages Conditional Flow Matching (CFM) with Diffusion Transformers to learn a direct mapping from dysarthric to clean speech. Our findings highlight the effectiveness of discrete acoustic units in improving intelligibility…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Phonocardiography and Auscultation Techniques
MethodsDiffusion
