End-to-End Voice Conversion with Information Perturbation

Qicong Xie; Shan Yang; Yi Lei; Lei Xie; Dan Su

arXiv:2206.07569·eess.AS·June 16, 2022

End-to-End Voice Conversion with Information Perturbation

Qicong Xie, Shan Yang, Yi Lei, Lei Xie, Dan Su

PDF

Open Access

TL;DR

This paper introduces an end-to-end voice conversion method that uses information perturbation and specialized encoders to improve naturalness, speaker similarity, and prosody transfer in converted speech.

Contribution

It proposes a novel end-to-end framework with information perturbation and a speaker-related pitch encoder for high-quality voice conversion.

Findings

01

Outperforms state-of-the-art models in naturalness and speaker similarity

02

Effectively transfers source prosody and maintains target speaker timbre

03

Enhances speech intelligibility and quality

Abstract

The ideal goal of voice conversion is to convert the source speaker's speech to sound naturally like the target speaker while maintaining the linguistic content and the prosody of the source speech. However, current approaches are insufficient to achieve comprehensive source prosody transfer and target speaker timbre preservation in the converted speech, and the quality of the converted speech is also unsatisfied due to the mismatch between the acoustic model and the vocoder. In this paper, we leverage the recent advances in information perturbation and propose a fully end-to-end approach to conduct high-quality voice conversion. We first adopt information perturbation to remove speaker-related information in the source speech to disentangle speaker timbre and linguistic content and thus the linguistic information is subsequently modeled by a content encoder. To better transfer the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing