An Evaluation of Three-Stage Voice Conversion Framework for Noisy and   Reverberant Conditions

Yeonjong Choi; Chao Xie; Tomoki Toda

arXiv:2206.15155·cs.SD·July 1, 2022

An Evaluation of Three-Stage Voice Conversion Framework for Noisy and Reverberant Conditions

Yeonjong Choi, Chao Xie, Tomoki Toda

PDF

Open Access

TL;DR

This paper introduces a three-stage voice conversion framework designed to handle noisy and reverberant speech conditions, demonstrating improved performance over baseline methods in challenging real-world scenarios.

Contribution

A novel three-stage VC framework utilizing denoising, dereverberation, and variational autoencoder-based VC to improve voice conversion in noisy and reverberant environments.

Findings

01

The proposed method significantly outperforms baseline models on noisy-reverberant data.

02

Noise and reverberation cause notable VC performance degradation.

03

Denoising and dereverberation steps still introduce some adverse effects.

Abstract

This paper presents a new voice conversion (VC) framework capable of dealing with both additive noise and reverberation, and its performance evaluation. There have been studied some VC researches focusing on real-world circumstances where speech data are interfered with background noise and reverberation. To deal with more practical conditions where no clean target dataset is available, one possible approach is zero-shot VC, but its performance tends to degrade compared with VC using sufficient amount of target speech data. To leverage large amount of noisy-reverberant target speech data, we propose a three-stage VC framework based on denoising process using a pretrained denoising model, dereverberation process using a dereverberation model, and VC process using a nonparallel VC model based on a variational autoencoder. The experimental results show that 1) noise and reverberation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing