Non-parallel voice conversion based on source-to-target direct mapping

Sunghee Jung; Youngjoo Suh; Yeunju Choi; and Hoirin Kim

arXiv:2006.06937·eess.AS·June 15, 2020

Non-parallel voice conversion based on source-to-target direct mapping

Sunghee Jung, Youngjoo Suh, Yeunju Choi, and Hoirin Kim

PDF

Open Access

TL;DR

This paper introduces a single neural network approach for non-parallel voice conversion that directly maps source to target voice parameters, enhancing speed and quality over traditional PPG-based methods.

Contribution

The proposed method simplifies non-parallel voice conversion by eliminating cascading networks and phonetic recognizers, enabling real-time application and improved voice quality.

Findings

01

Reduces network parameters by 41.9%.

02

Decreases conversion time by 44.5%.

03

Achieves better voice similarity than PPG-based methods.

Abstract

Recent works of utilizing phonetic posteriograms (PPGs) for non-parallel voice conversion have significantly increased the usability of voice conversion since the source and target DBs are no longer required for matching contents. In this approach, the PPGs are used as the linguistic bridge between source and target speaker features. However, this PPG-based non-parallel voice conversion has some limitation that it needs two cascading networks at conversion time, making it less suitable for real-time applications and vulnerable to source speaker intelligibility at conversion stage. To address this limitation, we propose a new non-parallel voice conversion technique that employs a single neural network for direct source-to-target voice parameter mapping. With this single network structure, the proposed approach can reduce both conversion time and number of network parameters, which can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing