An Evaluation of Three-Stage Voice Conversion Framework for Noisy and Reverberant Conditions
Yeonjong Choi, Chao Xie, Tomoki Toda

TL;DR
This paper introduces a three-stage voice conversion framework designed to handle noisy and reverberant speech conditions, demonstrating improved performance over baseline methods in challenging real-world scenarios.
Contribution
A novel three-stage VC framework utilizing denoising, dereverberation, and variational autoencoder-based VC to improve voice conversion in noisy and reverberant environments.
Findings
The proposed method significantly outperforms baseline models on noisy-reverberant data.
Noise and reverberation cause notable VC performance degradation.
Denoising and dereverberation steps still introduce some adverse effects.
Abstract
This paper presents a new voice conversion (VC) framework capable of dealing with both additive noise and reverberation, and its performance evaluation. There have been studied some VC researches focusing on real-world circumstances where speech data are interfered with background noise and reverberation. To deal with more practical conditions where no clean target dataset is available, one possible approach is zero-shot VC, but its performance tends to degrade compared with VC using sufficient amount of target speech data. To leverage large amount of noisy-reverberant target speech data, we propose a three-stage VC framework based on denoising process using a pretrained denoising model, dereverberation process using a dereverberation model, and VC process using a nonparallel VC model based on a variational autoencoder. The experimental results show that 1) noise and reverberation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
