A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker   Identity in Dysarthric Voice Conversion

Wen-Chin Huang; Kazuhiro Kobayashi; Yu-Huai Peng; Ching-Feng Liu; Yu; Tsao; Hsin-Min Wang; Tomoki Toda

arXiv:2106.01415·cs.SD·June 4, 2021

A Preliminary Study of a Two-Stage Paradigm for Preserving Speaker Identity in Dysarthric Voice Conversion

Wen-Chin Huang, Kazuhiro Kobayashi, Yu-Huai Peng, Ching-Feng Liu, Yu, Tsao, Hsin-Min Wang, Tomoki Toda

PDF

Open Access

TL;DR

This paper introduces a novel two-stage voice conversion paradigm that enhances dysarthric speech quality and preserves speaker identity without requiring normal speech of the patient, using sequence-to-sequence and variational autoencoder models.

Contribution

A new two-stage DVC framework that does not need patient’s normal speech, improving speech quality and maintaining speaker identity.

Findings

01

Improved speech quality in dysarthric voice conversion.

02

Effective preservation of speaker identity.

03

Flexible model design options demonstrated.

Abstract

We propose a new paradigm for maintaining speaker identity in dysarthric voice conversion (DVC). The poor quality of dysarthric speech can be greatly improved by statistical VC, but as the normal speech utterances of a dysarthria patient are nearly impossible to collect, previous work failed to recover the individuality of the patient. In light of this, we suggest a novel, two-stage approach for DVC, which is highly flexible in that no normal speech of the patient is required. First, a powerful parallel sequence-to-sequence model converts the input dysarthric speech into a normal speech of a reference speaker as an intermediate product, and a nonparallel, frame-wise VC model realized with a variational autoencoder then converts the speaker identity of the reference speech back to that of the patient while assumed to be capable of preserving the enhanced quality. We investigate several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Speech and Audio Processing