Learning Noise-independent Speech Representation for High-quality Voice   Conversion for Noisy Target Speakers

Liumeng Xue; Shan Yang; Na Hu; Dan Su; Lei Xie

arXiv:2207.00756·cs.SD·July 5, 2022

Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers

Liumeng Xue, Shan Yang, Na Hu, Dan Su, Lei Xie

PDF

Open Access

TL;DR

This paper introduces a noise-independent speech representation learning method using Glow-WaveGAN for high-quality voice conversion from noisy target speakers, effectively handling contaminated speech data.

Contribution

It proposes a novel noise-invariant latent feature learning approach with a noise-controllable WaveGAN and flow-based model, improving voice conversion quality with noisy inputs.

Findings

01

Achieves high speech quality in noisy voice conversion

02

Maintains speaker similarity despite noise interference

03

Demonstrates robustness with contaminated speech data

Abstract

Building a voice conversion system for noisy target speakers, such as users providing noisy samples or Internet found data, is a challenging task since the use of contaminated speech in model training will apparently degrade the conversion performance. In this paper, we leverage the advances of our recently proposed Glow-WaveGAN and propose a noise-independent speech representation learning approach for high-quality voice conversion for noisy target speakers. Specifically, we learn a latent feature space where we ensure that the target distribution modeled by the conversion model is exactly from the modeled distribution of the waveform generator. With this premise, we further manage to make the latent feature to be noise-invariant. Specifically, we introduce a noise-controllable WaveGAN, which directly learns the noise-independent acoustic representation from waveform by the encoder and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing