Generative Adversarial Networks for Unpaired Voice Transformation on Impaired Speech
Li-Wei Chen, Hung-Yi Lee, Yu Tsao

TL;DR
This paper introduces a novel end-to-end GAN-based unsupervised voice conversion method that transforms impaired speech into normal speech, enhancing intelligibility without requiring parallel data, and outperforms existing CycleGAN techniques.
Contribution
It presents the first end-to-end GAN model for unsupervised voice conversion specifically targeting impaired speech, preserving linguistic content and speaker identity.
Findings
Outperforms CycleGAN in transforming impaired speech
Preserves linguistic content and speaker characteristics
Effective for improving speech intelligibility in surgical patients
Abstract
This paper focuses on using voice conversion (VC) to improve the speech intelligibility of surgical patients who have had parts of their articulators removed. Due to the difficulty of data collection, VC without parallel data is highly desired. Although techniques for unparallel VC, for example, CycleGAN, have been developed, they usually focus on transforming the speaker identity, and directly transforming the speech of one speaker to that of another speaker and as such do not address the task here. In this paper, we propose a new approach for unparallel VC. The proposed approach transforms impaired speech to normal speech while preserving the linguistic content and speaker characteristics. To our knowledge, this is the first end-to-end GAN-based unsupervised VC model applied to impaired speech. The experimental results show that the proposed approach outperforms CycleGAN.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Speech and Audio Processing
MethodsBatch Normalization · Residual Connection · PatchGAN · *Communicated@Fast*How Do I Communicate to Expedia? · Tanh Activation · Residual Block · Instance Normalization · Convolution · HuMan(Expedia)||How do I get a human at Expedia? · Sigmoid Activation
