DuTa-VC: A Duration-aware Typical-to-atypical Voice Conversion Approach with Diffusion Probabilistic Model
Helin Wang, Thomas Thebaud, Jesus Villalba, Myra Sydnor, Becky, Lammers, Najim Dehak, Laureano Moro-Velazquez

TL;DR
DuTa-VC is a novel voice conversion method that uses diffusion models, works with nonparallel data, preserves speaker identity, and is aware of phoneme duration, improving dysarthric speech synthesis and recognition.
Contribution
It introduces a diffusion probabilistic model for typical-to-atypical voice conversion that preserves speaker identity and phoneme duration, trained on nonparallel data.
Findings
Captures severity characteristics of dysarthric speech
Reserves speaker identity effectively
Enhances dysarthric speech recognition
Abstract
We present a novel typical-to-atypical voice conversion approach (DuTa-VC), which (i) can be trained with nonparallel data (ii) first introduces diffusion probabilistic model (iii) preserves the target speaker identity (iv) is aware of the phoneme duration of the target speaker. DuTa-VC consists of three parts: an encoder transforms the source mel-spectrogram into a duration-modified speaker-independent mel-spectrogram, a decoder performs the reverse diffusion to generate the target mel-spectrogram, and a vocoder is applied to reconstruct the waveform. Objective evaluations conducted on the UASpeech show that DuTa-VC is able to capture severity characteristics of dysarthric speech, reserves speaker identity, and significantly improves dysarthric speech recognition as a data augmentation. Subjective evaluations by two expert speech pathologists validate that DuTa-VC can preserve the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Dysphagia Assessment and Management
MethodsDiffusion · Attentive Walk-Aggregating Graph Neural Network
