Learning Noise-independent Speech Representation for High-quality Voice Conversion for Noisy Target Speakers
Liumeng Xue, Shan Yang, Na Hu, Dan Su, Lei Xie

TL;DR
This paper introduces a noise-independent speech representation learning method using Glow-WaveGAN for high-quality voice conversion from noisy target speakers, effectively handling contaminated speech data.
Contribution
It proposes a novel noise-invariant latent feature learning approach with a noise-controllable WaveGAN and flow-based model, improving voice conversion quality with noisy inputs.
Findings
Achieves high speech quality in noisy voice conversion
Maintains speaker similarity despite noise interference
Demonstrates robustness with contaminated speech data
Abstract
Building a voice conversion system for noisy target speakers, such as users providing noisy samples or Internet found data, is a challenging task since the use of contaminated speech in model training will apparently degrade the conversion performance. In this paper, we leverage the advances of our recently proposed Glow-WaveGAN and propose a noise-independent speech representation learning approach for high-quality voice conversion for noisy target speakers. Specifically, we learn a latent feature space where we ensure that the target distribution modeled by the conversion model is exactly from the modeled distribution of the waveform generator. With this premise, we further manage to make the latent feature to be noise-invariant. Specifically, we introduce a noise-controllable WaveGAN, which directly learns the noise-independent acoustic representation from waveform by the encoder and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing
