Non-parallel voice conversion based on source-to-target direct mapping
Sunghee Jung, Youngjoo Suh, Yeunju Choi, and Hoirin Kim

TL;DR
This paper introduces a single neural network approach for non-parallel voice conversion that directly maps source to target voice parameters, enhancing speed and quality over traditional PPG-based methods.
Contribution
The proposed method simplifies non-parallel voice conversion by eliminating cascading networks and phonetic recognizers, enabling real-time application and improved voice quality.
Findings
Reduces network parameters by 41.9%.
Decreases conversion time by 44.5%.
Achieves better voice similarity than PPG-based methods.
Abstract
Recent works of utilizing phonetic posteriograms (PPGs) for non-parallel voice conversion have significantly increased the usability of voice conversion since the source and target DBs are no longer required for matching contents. In this approach, the PPGs are used as the linguistic bridge between source and target speaker features. However, this PPG-based non-parallel voice conversion has some limitation that it needs two cascading networks at conversion time, making it less suitable for real-time applications and vulnerable to source speaker intelligibility at conversion stage. To address this limitation, we propose a new non-parallel voice conversion technique that employs a single neural network for direct source-to-target voice parameter mapping. With this single network structure, the proposed approach can reduce both conversion time and number of network parameters, which can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
